








VOLUME XXI NEW SERIES, NO. 154 


JOURNAL 


OF THE 


AMERICAN STATISTICAL 
ASSOCIATION 


JUNE, 1926 


CONTENTS 


RELATIVE MERITS OF CIRCLES AND BARS FOR REPRESENTING 
COMPONENT PARTS. By Watrer Crossy Ee.is 


FORMATION OF FREQUENCY DISTRIBUTIONS. 
By Joun B. Cannine 


FORECASTING THE PRICE OF WHEAT. ByC.C.Bostanp........... 


A FREQUENCY CURVE ADAPTED TO VARIATION IN PERCENTAGE 
OCCURRENCE. By Ssmwaut Wricat 


SPURIOUS CORRELATION AND ITS SIGNIFICANCE TO PHYSI- 
OLOGY. By E. J. Gumset.. rr 


NOTES: 
CLASSIFICATION OF CAUSES OF DgATH AND DeEatTH REGISTRATION. 


AN AMERICAN Pornt oF View. By George H. Van Buren............ 


DETERMINATION OF PasT AND PRESENT SECULAR TRENDS. 
By Lincoln W. Hall 


VARIABILITY OF MorTALITY BETWEEN MEN AND WOMEN IN COPENHAGEN 
(213); PENNSYLVANIA DEPARTMENTAL Statistics (215); PROGRESS OF THE 
Census (218); Sates Quotas (220); MiscetLangous Notes (224). 


ee ss ok sign ekenbeenteeeke sees canendees 


PUBLISHED QUARTERLY BY THE 
AMERICAN STATISTICAL ASSOCIATION 
PUBLICATION OFFICE: Rumrorp Press, Concorp, N. H. 
EDITORIAL OFFICE: Cotumsia University, New York City 
Price $1.50 per copy $5.00 per annum 

















THE EMPLOYMENT CLEARING HOUSE 
OF THE 
AMERICAN STATISTICAL ASSOCIATION 


will aid you with your employment problem. It makes no charge to employers for its 
services. Any member of the Association may register for a position. There is no fee. 


THE FOLLOWING POSITIONS WERE OPEN IN MAY: 








Pay 


$2500- $3700- $5000 
4900 | and over 


M | [M/F | 











Agricultural Economics ... | 
Business—Mfg. and on 
Drafting. . : 
Economic Research. 
Governmental 
See eee 
Social Work . tae 
Technical Computing. . 





























$5000 


and over 


M | F 


$1800- $3700- 
} 4900 











| 
| 
| 
| 


M P M 





Advertising 

Agricultural Economics . . 

Business—Mfg. and Selling 

Economic and Industrial 
SS See 

Finance...... 

Insurance 

Marketing 

Social Work... .. ae | 

rere 

Transportation | 








Vital Statistics 

















COMMUNICATE YOUR NEEDS TO 
HELEN W. LINKE 


Secretary’s Assistant, American Statistical Association 
474 West 24th Street, New York City 


[NOT A PAID ADVERTISEMENT] 








NEW SERIES, NO. 154 (VOL. XXT) JUNE, 1926 


JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 


Formerly the Quarterly Publication of the American Statistical Association 








THE RELATIVE MERITS OF CIRCLES AND BARS FOR 
REPRESENTING COMPONENT PARTS 


By Wa.TER Crossy EEtuis, Whitman College 


INTRODUCTION 


The graphical representation of statistics has had a remarkable 
development in the past few years; the great value of this method, 
properly safeguarded, is generally recognized today. But of the 
possible graphic methods, many have not yet become standardized. 
The preliminary report of the joint committee of the national engi- 
neering and statistical organizations has done much to standardize 
graphic procedure, and its recommendations have deservedly been 
widely circulated and accepted.! 

But there are many phases of graphic methods which this report 
does not mention. For example, the very important matter of the 
best method of representing component parts is not touched by it. 
The two commonly used methods for exhibiting this class of facts 
graphically are by means of sectors of circles and by subdivisions of bars. 
Various opinions as to the relative merits of these two methods have 
been expressed by recent writers on statistics. 


OPINIONS OF AUTHORITIES 
& 


Brinton, in his excellent pioneer work on graphic methods, says: 


The circle with sectors is not a desirable form of presentation. . . . If the 
horizontal bar method were used as frequently as the sector method, it would 
be found in every way more desirable, . . . and would be read much more 
quickly ? and accurately than the method involving sectors.* 


1 Brinton, W. C. (Chairman), Preliminary Report of Joint Committee on Standards for Graphic Presen- 
tation, New York. Reprinted in McCall, W. A., How To Measure in Education, pp. 332-341; in Kelley, 
T. L., Statistical Method, pp. 42-43; in Haskell, A. C., How To Make and Use Graphic Charts, pp. 71-72, 
etc. 

2 In the quotations which follow, italics have been inserted by the author of this paper, and are not 
found in the originals, except where specifically noted. 

* Brinton, W. C., Graphic Methods for Presenting Facts, New York, 1914, pp. 6-7. 





























120 American Statistical Association [2 


Karsten, in his recent detailed and exhaustive work on charts and 
graphs, says: 

Few readers will judge quantities by either the arc at the perimeter or the 
subtended angles at the center—on the contrary most of them will judge entirely 
by the areas of the segments. . . . The disadvantages of the pie chart are 
many. It is worthless for study and research purposes. In the first place the 
human eye cannot easily compare as to length the various arcs about the circle, 
lying as they do in different directions. In the second place, the human eye is 
not naturally skilled in comparing angles—those angles at the center of the circle, 
formed by the various rays or radii and subtending the various arcs. In the 
third place, the human eye is not an expert judge of comparative sizes or areas, 
especially those as irregular as the segments of parts of the circle. There is no 
way by which the parts of this round unit can be compared so accurately and 








quickly as the parts of a straight line or bar. . . . The advantage of the pie P 
chart is psychological. It instantly commands the readers’ attention. A circle § 
is, of all geometrical patterns, the easiest resting spot for the eye. The fact is well 
known to advertisers. . . . Before using it in place of the simpler and sounder 


100 per cent bar, you should carefully gauge your audience or readers, and only 
if you believe you have begun to strain their interest should you judiciously 
insert the pie chart. In a sense, it might be construed as an insult to a man’s 
intelligence to show him a pie chart.? 


Turning to the field of educational statistics, we find that McCall, 
in his well known manual, says: 


The sectioned-bar diagram is an even better graph for presenting component 
parts. It is in almost every respect superior to the sector diagram. Visual 
comparisons of the components are easier.” 


Or in the field of economic statistics, Jerome says: 


The circle chart . . . is more difficult than the bar chert to read with sub- 
stantial accuracy, but it has a certain popularity. . . . This type is popular 
for the representation of the distribution of a dollar of expense, taxes, or family 
budget, and is quite appropriate for that purpose. However, even in such cases 
the same facts could probably be shown with equal ease and clarity by the ordi- 
nary bar diagram.® 

And finally Secrist, in the revised edition of his popular text book, 
is particularly severe in his detailed criticism and condemnation of the 
circle chart to show component parts. The short paragraph of his 
first edition is expanded into four pages of detailed criticism. Two 
brief extracts follow: 

When it [the circle] is divided into components, the parts appear‘ to stand 
in the relation of their respective chords. But this is not the case, since the 


smaller the sector, the longer the chord relative to its corresponding arc, and 
vice versa. The areas of the sectors are proportional to their respective arcs, 


1 Karsten, K. G., Charts and Graphs, New York, 1923, pp. 90-95. 

2 McCall, W. A., How To Measure in Education, New York, 1922, p. 348. 
3 Jerome, H., Statistical Method, New York, 1924, pp. 97, 66. 

4 The italics are in the original. 











Circles and Bars for Representing Component Parts 121 


but not to their respective chords. But it is the arcs which cannot be easily 
compared—they are circular—and relative lengths are not apparent. To be 
compared, they must be straightened out in the mind. The ease with which 
this can be done varies inversely with their length. All radii of a circle, of course, 
are equal and the lengths of the arcs are proportional to the angles at the center. 
But it is as difficult to compare the relative sizes of the angles as it is the lengths 


of the arcs. 
A pie diagram is a clumsy and defective method of illustrating component parts; 


a bar of uniform width . . . is much more satisfactory. 

The mathematical basis of this condemnation, comparisons by 
means of chords, is especially interesting in view of later results re- 
ported in this paper. Secrist further illustrates it by a detailed dia- 
gram (p. 183) of the various sectors of a four-component circle, strung 
along a line in different ways to show the difficulty and errors of com- 
parisons based upon chords. This circle was used (circle M), in the 
experiment described below. He also quotes approvingly the greater 
part of the quotation given above from Karsten. 

Lack of space forbids the quotation of similar criticisms of circle 
diagrams which are given by other recent authors in the statistical 
field—general, educational, or economic. Among them may be men- 
tioned Marshall, Gavett, Alexander, Williams, and Chaddock.? 


SUMMARY OF CRITICISMS 


This group of representative quotations and similar criticisms by 
the authors mentioned in the last paragraph, form a very severe indict- 
ment of the use of circular diagrams to show component parts. Not a 
single author has been found who strongly favors their use. The criti- 
cisms expressed by the authors mentioned may be summarized as 


follows: 
1. Circle diagrams to show component parts cannot be read as rapidly 
Wee a, 





ly as bar diagrams. 


qudcasily as b 
2. Circle diagrams to show component parts cannot be read as accu- 
rately as bar diagrams. — 
This is the most important criticism expressed. If true, it 
states a fundamental fault. 
3. The cause of the inaccuracy of circle diagrams for showing compo- 
nent parts lies in the method by which their components are judged. 
a. They are judged largely by areas, and therefore inac- 
curately. 


1 Secrist, H., An Introduction to Statistical Methods (Revised Edition), New York, 1925, pp. 181-185; 
First Edition, New York, 1921, p. 166. 

2 Marshall, W. C., Graphical Methods, New York, 1921, p. 37; Gavett, G. I., A First Course in Statis- 
tical Method, New York, 1925, p. 47; Alexander, C., School Statistics and Publicity, New York, 1919, 
pp. 235, 253; Williams, J. H., Graphic Methods in Education, Boston, 1924, p. 195; Chaddock, R. E., 
Principles and Methods of Statistics, Boston, 1925, p. 431. 











American Statistical Association [4 


b. They are judged largely by chords, and therefore inac- 
curately. 

c. The human eye cannot easily or accurately compare arcs 
or angles. 

4. Circle diagrams enjoy a certain unintelligent popularity—they 
catch attention, but are unworthy of serious use; while their 
popularity and psychological appeal are grudgingly admitted, 
their use is ‘“‘an insult to a man’s intelligence’’! 


It is noticeable that all of the quotations given seem to be purely 
matters of opinion—none show any evidence of an experimental basis 
of fact. It is the object of this paper to report the method and con- 
clusions of an experiment designed to test, by objective means, the 
validity of the criticisms expressed above. 

A page containing fifteen subdivided circles was given to each mem- 
ber of the class in general psychology in Whitman College. The stu- 
dents were instructed to think of each circle as representing 100 per 
cent and to write in each sector their best estimate of the percentage 
of the whole represented by that sector. They were told not to 
measure in any way, but to estimate as closely as possible, avoiding 
any fractional percentages; not to hurry, but to work steadily, in the 
order given, without skipping. At the end of five minutes each student 
marked the circle on which he was working. Thirteen minutes were 
allowed for this work. At its conclusion, four possible methods of 
forming judgments were outlined and explained by figures on the black- 
board; the students were then asked to analyze their mental processes 
and to indicate, if possible, the particular method which was used by 
each one. The four methods were as follows: 


a. By Areas of Sectors 

b. By Central Angles 

ce. By Ares on the Circumference 
d. By Subtending Chords 


Figure I is a reproduction of the sheet as used, except for the fact 
that the correct percentage figures have been added for reference. 
Each circle was 4.6 centimeters in diameter. 

After an interval of three days, which included the week-end with 
its non-scholastic distractions, a page containing identically the same 
data, but represented by subdivided bar diagrams, was given to the 
class, with similar instructions. This page is reproduced as Figure II. 
The shorter bars were 10 centimeters in length, the longer ones 20 
centimeters. 





Circles and Bars for Representing Component Parts 








Ss 
O 


FIGURE I 


The experiment was so arranged as to give to the bars all the ad- 
vantage of practice and of general familiarity with the nature of the 
problem. A further advantage for the bars was due to their arrange- 
ment on the page, with the common point of beginning, thus_facili- 








American Statistical Association 
| 25 | _j4 
- ss | {8 
[zz] Je 
[26 |  |o 
18] \é 
= Jf 
[ ws | g7 | so |s 
| 42 | ss | 25 |r 
[fo] 20] so | 4o jz 
[ 33s | 28 |77| 22 |W 




























































































a 1S 
L 44 | \7 
| 30 | 53 | 57 ily 
rz) A 
| 42 } “2 | 20 || #& |9 


tating comparisons between bars. At a subsequent meeting of the 
class, about three-fourths of its members said that this arrangement of 
the bars was a distinct: aid to them in making comparisons. It was 
definitely planned to make the conditions such that any possible ad- 
vantage would accrue to the bars so that if any superiority were shown 
for the circles, there could be no question from this standpoint of its 
reliability and significance. 

On the first day the class consisted.of 97 members, of whom 35 were 
men and 62 women. There were 51 sophomores, 38 juniors, and 8 sen- 
iors. Absences made these numbers slightly different on the second 
day, as indicated in the tables which follow. 































Circles and Bars for Representing Component Parts 125 


RESULTS OF THE EXPERIMENT 


The results of the experiment will be presented under the following 
heads: 
Rapidity of Judgment 
Accuracy of Judgment 
Method of Judgment 
Popularity and Appeal 
Conclusions 


Fm 99 Oo 


1. Rapidity of Judgment 
The rapidity score, based upon the number of circles or bars com- 
pleted at the end of five minutes, is shown in Table I. 


TABLE I 


NUMBER OF STUDENTS COM CETING THE GIVEN NUMBER OF CIRCLES AND 
BARS IN FIVE MINUTES 


























Number of Figures Completed Number of Students | Number of Students 
in 5 Minutes Completing Circles Completing Bars 
aired acetal ia ae We ela Ad ei ds Tale Adak ts RS A os ta 5 1 
CALDER CENA HE CREDA DUA OO eRe eeeeNeNeeakeee 9 3 
Minds cis uve waren babe daa ai ae we ae wo oka ekes wee 6 16 
ee rat aed ek dee ae eee we ok kaw ie cence adanee eee 8 15 
EE ee eC ey eRe ES OLS TANS ROR ee 32 24 
Dit CeRCA Re OM Oe RRR eed de neetuaews ee’ eu aee 15 25 
Di pihih ded paces a Skee eee ka eee baa eewawes 7 3 
DR tiihevGdeudek edd hena te eGenkahtareeaneewaeee 4 2 
DS aca. cewek Gaia ad ta ee cae aie oteam a wean 10 2 
Dit ia veh wie dd led data aaa ncaa eee eee 1 1 
iittcthwndsedelecnaet seahenonewedcess dees eedkecaun 0 1 
EEE ree aR oe See eee ey Bera, SP Em ere en See 97 93 
DG: txt die maxehee asses etinseneenwenniwen 9.6 9.5 
ia tl ot a in Sis ite ina aes wg HO eR Te 9.2 9.0 
i ee ae ene el bee EARN SASS 7a 1.5 
BS ate Sa a Se eee een 0.14 0.10 











The advantage seems to be slightly in favor of the circles whether 
judged by the median or the mean. This difference would doubtless 
be greater, and possibly significant, if it were not for the practice effect 
of the first day’s work. At any rate the data seem to indicate that the 
criticism that circle diagrams cannot be read as easily or rapidly as bar 
diagrams is not substantiated. 


2. Accuracy of Judgment 


For studying accuracy of judgment, the most important comparison 
to consider, the amount of error in “ points’? was marked in each 
figure. Thus a judgment of 22 per cent in circle A (25 per cent) was 
scored as an error of 3 points. Several significant methods of studying 
accuracy of judgment were then available. They are presented as 
follows: 








American Statistical Association 


(1) Average Error by Number of Subdivisions 
(2) Average Error by Sex 

(3) Average Error by Subdivisions 

(4) Large Errors 

(5) Comparison of Individuals 

(6) Comparison of Special Groups 


(1) Average Error by Number of Subdivisions 
The mean error in judgment for each sector and subdivision of a 
bar was computed, the results being summarized as follows: 


TABLE II 


MEAN ERROR IN JUDGING CIRCLES AND BARS ACCORDING TO NUMBER OF 
SUBDIVISIONS INVOLVED 

















‘ , M E i Mean E i 
Circles or Bars Having Reference Letter ieliee Circles Solitons Eee 

EE Ce rere A—H 1.21 1.48 
SRS ea I—K 1.77 1.54 
ea a a eae weed shew’ L—N 1.57 1.72 
I ina s 6 4 Wetknenlernaas Oo 1.29 2.14 
I a a cle 1.46 &. 

Mean, weighted proportional to number of sectors......... 1.44 1.64 




















The advantage is clearly in favor of the circles for all groups of 
figures except those having three subdivisions. The general superiority 
of the circles for accuracy is clear, as shown by a comparison of either 
the weighted or unweighted mean. 

It is especially interesting to notice the increasing accuracy of judg- 
ment in the case of the circles as the number of subdivisions is increased; 
the opposite is true in the case of the bars, where the mean error in- 
creases steadily with the number of subdivisions, until the mean error 
for five subdivisions is almost twice as great for the barsas for the circles. 


(2) Average Error by Sex 
The following table is similar to Table II, but divided on the basis 
of sex. 














TABLE III 
MEAN ERROR IN JUDGING CIRCLES AND BARS, BY SEX 
Men Women 
Circles or Bars Having Mean Error Mean Error Mean Error Mean Error 
in Judging in Judging in Judging in Judging 
Circles Bars Circles Bars 
EEE COREL TT 1.06 1.38 1.30 1.54 
os inte denen oake 1.64 1.43 1.84 1.62 
NG a iris gg dh or Rw ee 1.39 1.69 1.71 1.77 
I or as ea ace ka esece ated 1.15 2.34 1.44 1.94 







































































[8 


sis 

















9] Circles and Bars for Representing Component Parts 127 


For each sex, this table shows the same characteristics as Table II, 
as regards comparative accuracy of circles and bars. In addition it 
shows the greater accuracy in judgment on the part of the men. 
(Probable errors of the mean for men and women are given in Table VI.) 


(3) Average Error by Subdivisions 
Table IV exhibits in detail the mean error for each sector of a circle 
and the corresponding subdivision of a bar. The means are the same 


as given in Table I. 
TABLE IV 


MEAN ERROR, BY SU BDIVISIONS, } BY _CIRCL ES AND | BY BARS — 









































‘ Mean Error | Mean Error Error by 
Figure Per Cent by Circles by Bars Circles Less 
re A 25 0.09 0.72 * 
B 33 1.85 1.07 
Cc 22 1.13 1.79 * 
D 28 1.30 1.73 * 
E 8 1.38 1.37 
F 44 1.59 2.16 * 
G 25 0.42 0.60 * 
H ay 1.94 2.38 | * 
| Mean 1.21 148 | . 
Three Divisions. ........... I 33 1.24 0.89 
37 2.43 2.23 | 
30 1.78 1.55 | 
J 30 1.50 1.58 | * 
33 1.31 0.63 
| 37 2.30 2.01 
| K 42 2.07 2.04 
33 2.16 1.49 
25 1.10 1.45 * 
Mean 1.77 1.54 | 
Four Divisions. .......cc0- L 10 1.07 0.98 | 
20 1.08 0.93 
30 1.17 1.13 
40 1.29 1.23 
M 33 2.24 1.85 
28 1.68 1.77 * 
17 1.84 1.97 . 
22 1.43 1.91 * 
N 44 1.78 2.03 . 
8 1.51 1.74 * 
33 2.15 2.81 . 
15 1.64 2.23 * 
Mean 1.57 1.72 os ie . 
Five Divisions............. oO 42 1.88 2.74 | * 
13 1.68 2.01 | * 
20 1.17 2.76 md 
10 0.92 1.17 * 
15 0.81 2.03 * 
| Mean 1.29 2.14 | * 























This table shows that more accurate iademente were ced by 
means of circles (marked *) in 20 cases; by bars in 14 cases. It shows 
that more accurate judgments were expressed by means of circles 
when there were two, four, or five subdivisions; by bars when there were 
three subdivisions. 














128 American Statistical Association 





[10 


Particular interest attaches to circle M and to the corresponding 
bar. This is the example used by Secrist in such detail to show the 
inaccuracy of circle representations.! Yet the average error for this 
circle was 1.80; for the corresponding bar, 1.88. 


(4) Large Errors 


Another method of comparison is to consider the number of “‘large”’ 
errors made by each method. ‘‘Large”’ is a purely relative term, but 
let it be assumed that errors of only one, two, or three points in judg- 
ing a diagram are of little consequence; but that errors greater than 
three points are serious and likely to lead to bad judgments. In other 
words, let “large errors’”’ be defined as those greater than three points. 
This is entirely arbitrary but it does not seem unreasonable. The total 
number of “large errors,’’ thus defined, made by the entire class is 
shown in the following table. 


TABLE V 


TOTAL NUMBER OF LARGE ERRORS IN JUDGING CIRCLES AND BARS 




















: Large Errors Large Errors Errors in 
Figure in Circles in Bars Circles Fewer 

Ee ee 0 7 * 
dpb hideucwacuwammanuak de 6 5 * 
Ea eer 2 9 * 
sii alan da ialcir ie raed lee macnn tb ee 4 7 * 
icpasnatwolinsewde embeded 4 3 
SEER ner 12 29 * 
¢ 4 6 * 
EERE eA ee ey 22 31 * 
TN il i ak eee ed 16 10 
I san gi ti aia mc ln cps eas 12 8 * 
EEE Rees: 29 27 
Dh cvkuauseenweesveies sume ws 24 38 * 
a ies oie niacin ahaa dase bo aan 24 34 * 
ck cis Aenea ah ie ae ie a 33 68 * 
Dink-wmksaiwd aeoseeunteesens 9 67 * 

IDS ora:t tek emdonwan Rete saeeed 201 349 * 














In all except five cases (B, E, I, J, K) the superiority of the circles 
is clearly evident, and in those five the balance in favor of the bars is 
slight. It is noteworthy that no large errors were made in judging 
circle A, while seven were made in the corresponding bar; that more 
than twice as many large errors were made by bars in figures F and N; 
that four and a half times as many large errors were made by bars in 
figure C; and that seven and a half times as many large errors were made 
by bars in figure O, containing five subdivisions. The total number of 
large errors was seventy per cent greater in the case of bars than of 
circles. 


1 Secrist, H., Ibid., (Revised Edition), p. 182. 























11] Circles and Bars for Representing Component Paris 129 


(5) Comparison of Individuals 


An individual accuracy score was secured by taking the average 
error for all of the sectors on each paper, both circles and bars. Com- 
parisons were then made between the two papers belonging to the 
same individual. On account of absences on one day or the other 
there were only 85 sets of papers that were available for this compari- 


son, which is shown below: 
Entire Class 


NS SEE OCTET EEO EOE CE COT 
NS ee eae ee 


Men 


ES ST Se ee 
i ee ee eee 


Women 


ie We Ce. gg gg osc ccccwscteccex 
EE EE re ree 


57 
28 


22 
9 


35 
19 


Each sex and the class as a whole did better with circles twice as 


often as with bars. 


(6) Comparison of Special Groups 


A further summary of individual scores for special groups of stu- 


dents is given in table VI. 
TABLE VI 


SUMMARY OF INDIVIDUAL SCORES FOR SPECI/ 





L GROUPS 








Circles Bars 
All Students 
i. ee eee ce aa eM eKe 97 94 
a ek etna bh bona ch ethene ae 1.41 1.60 
Standard Deviation (7).................. a 0.66 0.61 
- Pes PGE EEO. oo xs cccccscvccccsens 0.05 0.04 
Men 
I i Be eral aria ack all kis ele ee 35 36 
SE a eer rer 1.27 1.55 
i a acco cee aekeaeawene 0.40 0.51 
Probable Error of Mean... .......cccccccceces 0.05 0.06 
Women 
NE i a he eae ale eae ee 62 58 
EEE ee ee ee ee ree 1.50 1.63 
em Bruen GE)... sc ccccccccccscsececes 0.74 0.63 
Probable Exror of Mean... ........-sseceseee: 0.06 0.06 
Mathematics Majors 
ee ease ede eeet se eetuereane 8 8 
a aks se ae ieee ee AUN 1.50 1.63 
ees PRONNNNO OG)... occ cccccvscscvesecss 0.30 0.38 
IE Sn nn acne eneeeneeese 0.07 0.09 
Students Who Worked Rapidly 
es ceuicnee i awee 37 34 
ed cade ee 1.48 1.83 
Pemenne Bravantem G9)... cc cccccccccsceveccess 0.44 0.73 
PUD ERIUEE GE BIGDR. 2 oo occ cc ccccvcsccecese 0.05 0.08 
Statistics Class 
ESE me re ere ee eet 11 12 
Ee ee eee min ee peau ae eeeee te 1.37 1.48 
Standand Devintion (@)......cccccccceseceees 0.59 0.51 
PE I OE III, ko ok nc krecwesescnvess 0.12 0.10 


























130 American Statistical Association [12 


The superiority of the circles appears in each group compared. 
The superiority of the men over the women is also evident. 

Can students with considerable mathematical training judge either 
circles or bars more accurately than an unselected group? Eight of 
the class were mathematics majors, six men and two women. As far 
as conclusions can be drawn from this limited number, it seems that 
students skilled in mathematics judge slightly better than the general 
group, but not markedly so. The six men ranked 4, 6, 8, 9, 27, and 29 
among the 40 men arranged according to score in judgment of circles. 
The two women similarly ranked 17 and 25 among the 66 women. 
These facts seem to indicate that circles are excellent for the general 
reader—that their utility and accuracy is not limited to those with 
special mathematical training. Further evidence along this line is 
suggested by the fact that a seventh grade boy with the circle diagram 
made a score which would have placed him in the third quartile of the 
college class tested; while a college professor (not of mathematics!) 
only succeeded in making a score which would place him in the lowest 
quartile! 

Did the students who worked rapidly work any more accurately 
than those who studied the diagrams more carefully? To investigate 
this question the papers of the students who completed ten circles or 
bars in five minutes (See Table I) were summarized separately. The 
results indicate slightly less accuracy for the rapid working group than 
for the entire class, in the case of circles, but the probable errors indi- 
cate that this difference can hardly be considered significant. On the 
other hand, in the case of the bars, the results show significantly less 
accuracy for the rapid working group. 

The entire experiment was also tried with the members of a small 
class in statistical methods, composed of juniors and seniors. The re- 
sults were similar to those for the larger class, but the small number 
makes the results in themselves less reliable. 

From every standpoint the conclusion seems to be clear that greater 
accuracy is indicated by circle representations than by bar repre- 
sentations. And this in spite of the fact that the experiment was 
arranged so as to give the advantage to the bars. 


3. Method of Judgment 


The third subject of special interest is the method of judgment 
used. Thirteen men and nineteen women thought they used two of the 
suggested methods, but indicated one as predominant. Two men 
claimed to have used all of the first three methods. Table VII gives 











E 
u 
bg 
5 


atta OPS, oa eds 











13] Circles and Bars for Representing Component Parts 131 


the number claiming to use each of the four suggested methods either 
exclusively or predominantly. 


TABLE VII 
METHOD OF JUDGMENT OF SECTORS O 


F CIRCLES 
Entire Class 
Method Women 


Per cent Number 








By BION. oc oc cccoecvccssvecees 
By Central Angles. ..........+.- 
RG in aa e 100 0es Mee hes Socer 
By Chords 














No single means of judgment was used, although that by arcs on 
the circumference was most frequent. The distribution for the two 
sexes is essentially the same. The method by chords, the chief basis 
of Secrist’s extended criticism, was used by only one student, a woman! 
One other used it as a secondary method in connection with the 
method of central angles. Conversation with the student who used it 
exclusively revealed the fact that she had had considerable special 
training in a similar method of judgment in her work as an artist. 

What was the relative accuracy of these four methods of judgment? 
The answer is found in Table VIII. 


TABLE VIII 
ACCURACY OF JUDGMENT OF SECTORS OF CIRCLES BY DIFFERENT METHODS 








Me } a > > Err 
fethod Number of — S. D. (e) I ary ty or 





By Areas 2: .4 4 0.06 
By Central Angles.............. 2% 3! 0.06 
By Arcs K 0.05 
By Chords . csi ee 

















Judgments expressed by any one of the three methods commonly 
used are of practically equal accuracy. There is no significant differ- 
ence between them. 


4. Popularity and Appeal 


After the conclusion of the experiment, and before it was discussed 
with them, the students were asked the following question: 

If you were required to draw a diagram showing the relative number of stu- 
dents—seniors, juniors, sophomores, and freshmen—composing the entire 
student body of the college, would you choose a subdivided circle or a sub- 
divided bar? 

























American Statistical Association [14 





132 


Twenty-five students expressed a choice of the bar, seventy-one 
of the circle. 

It is not within the scope of this paper to give examples of the 
common use of circles to show component parts in the published lit- 
erature. Reference may be made, however, to the very effective use of 
127 circles in the Statistical Atlas of the United States, to their fre- 
quent use in Dr. Ayres’ War With Germany, and in numerous recent 
school surveys. 

Such references, however, are unnecessary since the popularity, 
common use, and psychological appeal of the circle diagrams is ad- 
mitted. The principal basis of criticism has been their supposed in- 
accuracy. 


5. Conclusions 


Referring to the four criticisms summarized on pages 121-122, it 
may be said that the experiment described in this paper seems to 
justify the following conclusions: 


1. Circle diagrams to show component parts can be read fully as 
rapidly and easily as bar diagrams. 

2. Circle diagrams to show component parts can be read more accu- 
rately than bar diagrams. In addition, accuracy of judgment 
increases with the number of subdivisions in circles, but de- 
creases in the case of bars. 

3. Circle diagrams to show component parts are judged by different 
methods, approximately 50 per cent of the group considered 
having used ares, 25 per cent areas, and 25 per cent central 
angles. But whatever the method of judgment, there is no 
significant difference in accuracy of judgment by the different 
methods. 

4. In view of the three conclusions stated above, the use of circle 

diagrams to show component parts is worthy of encourage- 

ment; they should be recommended, not only on account of 
their popularity and psychological appeal, but also on the basis 
of scientific accuracy; their use should be considered a “‘com- 
pliment to a man’s intelligence.” 








The Formation of Frequency Distributions 


THE FORMATION OF FREQUENCY DISTRIBUTIONS 


By Joun B. Cannino, Stanford University 


THE PROBLEM STATED 


A survey of the literature finds that much has been written upon the 
numerical treatment of data in the form of frequency distributions and 
that little consideration seems to have been given to the formation of 
the distributions themselves. Of the first type of study Mr. W. F. 
Sheppard’s paper “On the most Probable Values of Frequency Con- 
stants”! is a well known and excellent example. Among the results of 
this study are to be found certain particular formulae, commonly 
referred to as ‘“‘Sheppard’s corrections,’’? whereby the values of the 
second and higher moments of a special type of distribution may be 
adjusted to compensate errors likely to be introduced by treating all 
frequencies as though they were located at the mid-values of their 
respective classes. 

When one is dealing with a distribution of a suitable type, these 
simple corrections are convenient; and when the original variates are 
not at hand the corrections are valuable. Since these conditions are 
often met the paper referred to must continue to be regarded as a useful 
one. But it should be remembered that the corrected values them- 
selves are but approximations to values that might have been obtained 
more precisely without adjustment if the original records had been 
available or, lacking these, if the distribution had been so formed in the 
first place that the ordinary methods of calculating the moments 
would do without adjustment. 

Whenever inferences are to be drawn from the value of any moment 
of a distribution, or from any function of a moment, it is obvious that 
the most probable value of the property is the one desired. In the ab- 
sence of other reliable information, that value is always the one to be 
found by employing the recorded values of the variates. That is, given 
the original records, no subsequent arrangement into intervals greater 
than the one in which the original readings were taken can, a priori, 
result in a distribution that, in turn, will yield a value for any of the 
moments more reliable than that obtainable directly from the variates. 
And this is true no matter what operations are performed upon the 
distribution. 


1 Proceedings of the London Mathematical Society, Vol. XXIX, p. 353. 
? Formulae numbered (28), (29), (30), and (31), pp. 368-369, ibid. 














American Statistical Association [16 





134 


To the extent, then, that convenience in publication and suggestive- 
ness of the exhibit require the forming of frequency distributions it 
seems desirable that the distributions adopted should be those that 
affect as little as possible the values of the principal properties. Given 
a set of variates, it may be assumed that a frequency distribution of 
equal intervals of any given magnitude! may be found such that any 
given measures (of position, dispersion, etc.) derived from it by conven- 
tional operations will deviate from the true measure as little as, or less 
than, the like value derivable from any other distribution of equal 
intervals. Of course it is not true that any one arrangement can be 
expected least to affect the values of each and every one of the important 
constants. All that is asserted is that there is at least one distribution 
better with respect to some particular constant than any other having 
intervals of like magnitude. However, trials of the procedure to be 
developed herein will show that very often it will be found possible to 
set up a distribution such that none of its really important constants is 
so affected as to make it needful to adjust any of the values of the con- 
stants found by the simplest of conventional procedure. 

Since the arithmetic mean is, legitimately, the most commonly em- 
ployed measure of position and since the standard deviation is the most 
generally convenient measure of dispersion, the procedure with respect 
to these has first been considered. This paper offers first, a convenient 
method of forming a frequency distribution such that its mean will de- 
viate from the true mean by a minimum and, second, a method of like 
intent with respect to the standard deviation. 


THE FREQUENCY DISTRIBUTION AND THE ARITHMETIC MEAN 


Symbolically expressed it is desired that 2X—Z(p.m) shall be 
numerically a minimum,’ wherein X represents the values of the 
variates and (p.m) represents the product, population times the mid- 
value of the interval. 

The conventional procedure of substituting 2(p.m) for DX is vari- 
ously justified by the writers. The implicit assumption is explained to 
be: (1) ‘that all values within the interval occur at the mid-value”’; 
(2) “that the frequencies are uniformly distributed within the inter- 

1 The question of whether or not equal intervals are always to be preferred is left aside for the moment. 
It may very well be that, for some properties, the intervals should be uniform with respect to some func- 
tion of the variates rather than with respect to the variates themselves. Thus, if the geometric mean of 


the distribution is likely to be the most important measure of position it may, perhaps, be best to employ 
logarithmically equal intervals. 


are of greater importance than those of 2X—Z(p.m) will 





2 Instances in which values of 1— 
= (p.m) 


not be found to require a different treatment—although, occasionally, a different set of limiting values 
for the intervals would be adopted. 




























ra 
OP Rig OP 


Cg ats 








17] The Formation of Frequency Distributions 135 


val”; (3) “that the values are symmetrically arranged about the mid- 
values”; or (4) “that the arithmetic means and the medians (mid- 
values) of the intervals coincide.” But really to justify the procedure 
would require the further assumption—seldom expressed, and often 
contrary to fact—that the error in the mean, arising from the expressed 
assumption, will be negligibly small. It is obvious that, if any of the 
first four assumptions were true in fact in any given case, >X —Z(p.m) 
would be equal to zero. Not only is it certain that none of these is 
often approximately true in fact, but it is certain also that all four of 
these assumptions may be very wide of the mark and still it might be 
found that 2X —Z(p.m) is numerically very small. The general con- 
dition to be satisfied is that the sum of the deviations of the variates 
from the mid-values of their respective intervals shall be zero or, at 
least, very small. It will be seen at once that this condition includes 
all those enumerated above. 

For the sake of convenience and brevity the procedure in finding a 
distribution for which =(p.m) is approximately equal to 2X will be 
limited to the cases in which uniform intervals are to be desired. 
These uniform intervals, for the same reasons, will be assumed to be 
multiples of the unit in terms of which the original record was taken. 
And third, the limits of the intervals will be taken as falling at points 
midway between the scale readings at which observations were (or 
would have been) recorded. Thus, if the data to be grouped are pay- 
roll amounts wherein the employee’s pay for a calendar week is his 
rate per hour times the number of hours worked, expressed to the 
nearest five cents, the intervals in the distribution would be ten cents, 
fifteen cents, ... one dollar... . And the limiting values of the 
intervals would fall at $X.025, $X.075, $X.125, . . 1 If a distribu- 
tion of twenty-five cent intervals were to be adopted there would be 
five possible distributions each with its peculiar set of interval limits 
and mid-values. If a dollar grouping were chosen there would be 
twenty such distributions the mid-values of the intervals of which 
would be $X.025, $X.075, . . . $X.975. And, of course, it is not only 
the mid-values that would have common decimal endings; for every 
frequency of given digit ending will fall at precisely the same distance 
from the mid-value of its class, no matter which of the twenty distribu- 
tions is taken. 

This simple relation, and the desideratum that the algebraic sum of 
all deviations of variates from the mid-values of their respective classes 
shall be a minimum, permits a systematic test of the goodness of each 


1 While these are special cases only, the procedure following will be found applicable quite generally— 
although it will be simplest to use when, as here, the commonest practice is followed. 














136 American Statistical Association [18 






of the possible sets of limits easily to be made. It will, perhaps, be 
clearer if the method is first illustrated specifically and then made 
general. 






TABLE I 


PRIMARY DISTRIBUTION OF NUMBER OF EMPLOYEES WHO, FOR THEIR WORK 
OF THE WEEK, RECEIVED:! 









ot 
18 


1 
2 
3 
4 
5 
1 
2 
3 
4 
5 
1 
2 
3 
4 
5 
1 
2 
3 
4 
5 





To 7 3 9 
1 Data from Mills; Statistical Method. 


The column on the extreme left is but a numbering in repeating series of 5’s (the number of primary 
intervals in the interval of the distribution to be formed). This repeating series might begin at any 
point on the money scale and with any of the numbers in the series 1 to 5. 














[18 


be 
ide 


ed 


a 


chassis achat stds 


=a 


en Seen fDi CT S94 














19] The Formation of Frequency Distributions 137 
I. The Method Illustrated 


Suppose that the original data are the amounts paid to the employees 
of an enterprise at the end of a given week, earnings being calculated to 
the nearest multiple of five cents. Let the requirement be to find that 
frequency distribution of twenty-five cent intervals of which 2(p.m) is 
most nearly equal to the total payroll debit. The steps taken are: 

A. Form a primary distribution, i.e., a schedule of amounts within 
the range that might have been paid and the frequency of each actual 
payment. The mid-values of such a distribution will, of course, fall at 
money payments of digit endings $X.00, $X.05, . . . $X.95. 

It is evident that if we elect that frequency distribution wherein the 
mid-values fall at money values ending in .10, .35, .60, and .85 (the 
values falling opposite the serial number 3) all values that end in .20, 
.45, .70, and .95 will lie two units (ten cents) above the mid-values of 
their classes. And all values that end in .05, .30, .55, and .80 will lie 
one unit below the mid-values, and so on. 

B. Proceed, then, to collect the frequencies falling opposite each of 
the serial numbers: 

Opposite the serial numbers: 1; 2; 3; 4; and 5 
The total frequencies are 56; 71; 44; 26; and 13 respectively. 

C. Calculate 2X —=Z(p.m) for each of the five distributions. 

Let D,, Deo, . . Ds represent the five possible distributions of which 
the mid-values fall opposite serial numbers 1, 2, . . 5. Then, for any 
distribution, D, 2X —=Z(p.m) ==(f- d) wherein d denotes the deviation, 
by the subjoined number of units, of an observed value from the mid- 
value of its class, and f denotes the frequency of the associated value 
of d. 


TABLE OF PRODUCTS, (f.d), FOR THE FIVE DISTRIBUTIONS 





























d= D D: Ds De Ds 
2 88 52 26 112 142 
1 71 44 26 13 56 
0 — = — = = 
—1 —13 —56 — 71 — 44 —26 
—2 —52 —26 —112 —142 —8s 
* (f.d) 94 14 —131 — 61 84 








Note that the sum of the values of type =(f- d) must always be zero; 
for every value, f, is associated successively with values of d of 2, 1, 0, 
—1, and —2. This affords a convenient check upon the accuracy of 
the arithmetical work. Note also that, even for the data employed 
here, and even with the small interval, considerable difference in values 
of 2(f - d) occur. 











138 American Statistical Association [20 


II. The Method Shortened 


Inspection of the specimen computation shows, though, that the 
amount of numerical work, by this method, increases roughly with the 
square of the magnitude of the interval. So long as this magnitude is 
but a smail multiple of the unit of recorded measure this process is not 
over-tedious. But if, instead of a twenty-five cent interval a two- 
dollar interval is chosen the number of products (f - d) to be calculated 
(including those of zero) becomes 1600 instead of 25. And if the initial 
record had been made to the nearest cent then, for a two-dollar interval, 
the number of products to be found would be 40,000! 

Fortunately it is possible to shorten the labor when the interval is 
large. Consider the following schedule: 

SCHEDULE OF THE FREQUENCIES OF DEVIATIONS :, di,..d—2 IN THE FIVE 
DISTRIBUTIONS Di, D2, .. Ds 














Frequencies in: 




















Values 

of d 

Di Dz Ds Ds Ds 
dz 44 26 | 13 56 71 
dz 71 44 26 13 56 
do 56 71 44 26 13 
d—1 13 56 71 44 26 
d—2 26 13 56 71 44 























Inspection of the schedule shows at once that, as one reads from right 
to left by columns, the frequency d; in D, becomes d, in De, . . and d-» 
in Ds. Likewise d; in D,; becomes dy) in Dz andsoon. Adopt, in lieu of 
the numerical subscripts of d, the general notation wherein z denotes the 
magnitude of the interval (here five units—or twenty-five cents) and r 
denotes the maximum numerical deviation of an observed value from 


the mid-value of its interval; so that rai. Values of d then range 


from d, to d-,. Substituting this notation in the schedule above it will 
be noted that the frequencies opposite d, to d,-, in column D, are asso- 
ciated in column D, with values of d with subscripts diminished by 
unity and that the frequency opposite d_, in D,; becomes the frequency 
of d, in Ds, i.e., for this frequency the value of the subscript of d is in- 
creased by 2r. 

If f - d denotes products, frequency times deviation from mid-value in 
D, and f’.d’ denote like products in Ds, and if the subscript of f is the 
associated value of d, then 

B(f’-d’) =2(f-d)—(n—f-n) +2rf- 
and, since f-,=f,’ and since 2r+1=7 


X(f’- d’)=Z(f-d)+i-f,’—n 


















_ 


ce ll Na ins Baim Abad anda 
































[20 


the 
the 
e is 
not 
vO- 
ted 
‘lal 
al, 


is 


it 





iin deblbeaales eee 


ete toe 


‘suds tee 


= 


nS ads a shite atti, Hl. 


9 he a oh 





OO RD 





21) The Formation of Frequency Distributions 139 


The value of =(f’ - d’) found by substituting in (1) can then be employed 
in finding the like value in D; and so on. In this formula 7 and n are 
constants, the value of f,’ can be read from the schedule of frequencies 
falling opposite the several serial numbers (see schedule p. 138). And 
only one value, =(f.d) must be calculated independently (as in para- 
graph C) instead of 7 such calculations 


The calculation then becomes: 




















Terms of Formula Di Dz Ds Ds Ds 

} 
OS eee (84)* 94 14 —131 —6l 
LS errr (220)* 130 65 280 355 
Ee: (304)* 224 79 149 294 
ae eee (210)* 210 210 210 210 
.  , ar 94 14 —131 —61 84 











*These values were not here employed in obtaining the final total (94) of the column. The latter 
value was independently found. The numbers in parentheses are inserted to show how the final value 
of this column would have been obtained had the process been begun with another distribution. 


III. The Method Applied to Wider Intervals. 


If, instead of a twenty-five cent interval, a dollar grouping were de- 
sired it would be necessary merely to number the measuring scale 
graduations in series of 20’s (instead of 5’s as above). From the sched- 
ule of frequencies of the 20 values of d the value of =(f - d) may be found 
for one of the twenty distributions. If this value is then substituted in 
the formula (1) together with the value of 7, the successive values of 
i.f’,, and the value of n, the 19 other values of =(f’ - d’) can quickly be 
found. The amount of arithmetical work by this method, increases 
proportionally with 7 and so, even for very wide intervals, the labor of 
finding the best set of limits is not great. Certainly it is not great in 
comparison with the scrupulous solicitude bestowed by careful statisti- 
cians on obtaining precision in original measures. 


IV. The Range of =(f - d) 


The body of data here used was intentionally adopted because the 
range of error in the mean, arising from fluctuations in the values of 
=(f.d), is small as compared to that range in the general run of money- 
value data. But even with an interval small enough to give 39 classes 
it can be seen that the range of =(f- d) is from—131 to 94, or, reduced 
to absolute money value $11.25 while the minimum error is 14 units 
or $.70, the latter giving an error of only $.003 in the mean. 

The same body of data arranged in dollar groupings produces a 
range of =(f-d) from—221 to 279 or $25.00. On the other hand this 
arrangement brings to light three equally good distributions which 
















140 American Statistical Association [22 








give an error in the mean of only $.0002. Arrangement in two-dollar 
groupings gives a range of error in the aggregate from—351 to 579 or 
$46.50. But even here, with only six classes, a distribution is found 
whose mean is subject to a grouping error of only $.0026. 

It will readily be seen that for many of the variables the values of 
which cluster not only about the mode but also, in less degree, about 
other values, very large differences in the mean may result from the 
choice of limits hit upon. If, for example, instead of the payroll ex- 
hibited here one had been chosen in which the contractual wage rates ; 
were weekly rates, we should expect to find a preponderance of ratesin fF 
whole dollars, a number of rates ending in .50, a lesser number at val- 
ues ending in .25, and .75, and a scattering of rates of other decimal 
endings. The one-dollar groupings of such data would, of course, 
exhibit a much greater range of values of the mean. The writer has 
found instances in the work of reputable writers wherein this error was 
nearly one-half an interval and as great as two-tenths of the mean itself. 

Since the distribution here dealt with does not deviate markedly from 
that of the normal curve of error a comparison of the grouping error 





























with the probable error (or deviation), P.E.mean = .6745——— may be 
n 

made. To be sure the number of observations is too small to permit 

reliance upon this comparison in this instance, but, had data been 

chosen with a view to showing how great the grouping error often be- 

comes even where n is very large (the intervals being large also) much 

more striking and reliable results would have been found. 


COMPARISON OF GROUPING ERRORS, 2(f-d)/n, 

















WITH PROBABLE ERROR, —°.*°? 
v n 
Maximum Numerical Minimum .67450 in Whih Greene toe 
Le Value of Minimum of Value of a is Numerically Greater 
S(f.d)/n S(f-d)/n Z(f-d)/n _ Than 4 PE. 
$0.25 .022 .0030 — .032 .682 0 
$1.00 .066 — .0002 —.053 .082 7 
$2.00 .142 — .0026 — .084 .082 18 






































It will be noted that even the smallest interval (which gives 39 
classes) has a range of the grouping error of approximately one-third 
P.E. The dollar grouping (11 classes) shows a range-in this error of 
about three-fourths P.E. and the two-dollar grouping (6 classes) exhib- 
its a like range of over one and one-third P.E. At the same time the 











[22 


llar 
) or 
ind 


: of 
out 
the 
ex- 
tes 
; in 
al- 
nal 
Se, 
1as 
yas 
lf. 
ym 
ror 


be 


ns 
ror 
ber 





' 
5 
‘ 
- 
{ 


a ali oneal bh es 3 








23] The Formation of Frequency Distributions 141 





best distribution in each interval results in a grouping error too small to 
be a matter of concern. But the chance of obtaining a grouping error 
as great as one-half P.E. becomes greater as 7 is increased. 


THE STANDARD DEVIATION AND MEASURES OF SKEWNESS 


Inferences drawn from the magnitude of the standard deviation of a 
series of values, or from differences or ratios of this measure of two or 
more series, depend in large part for their validity upon the truth of the 
theorem that the root-mean-square deviation of a set of values is 
least when deviations are measured from the arithmetic mean. It is 
well known that the value of this measure derived by treating all 
frequencies as though they occurred at the mid-values of their respec- 
tive intervals is not, in general, equal to the true measure derived from 
the variates themselves. The unadjusted value taken from a grouped 
series may be either too small or too large. But the range of relative 
error attributable to the particular grouping hit upon is, for many kinds 
of series, too large to be ignored. And the modes of adjusting for this 
error (e.g. Sheppard’s Correction) deal at best only with special cases 
and, as will be shown, are not, even then, always to be relied upon if the 
distribution can be so formed as to minimize the error in the second 
moment. 

Then, too, the commonest measure of skewness, somean a mode 
to be greatly affected by an ill constructed distribution. For, however 
the value of the mode be found, the grouping chosen may materially 
affect either or both members of the skew ratio. And since a given 
grouping may yield too large a value for the mean and too small a 
value for the standard deviation, or vice-versa, the range of relative 
error introduced by a faulty grouping may become very great indeed. 

Unfortunately that distribution which least displaces the mean can- 
not be counted upon least to affect the standard deviation. If, then, a 
distribution is wanted for which the directly calculated values of both 
the first and second moments are not materially to be affected by the 
grouping, a special test for the goodness of each of a set of distributions 
with respect to the second moment is also needed. 

To assist in analyzing the influence of grouping upon the value of the 
standard deviation let the following notation be added: Let m,— M =a, 
m2—M=az, . . . mg—M =a, in which M is the true value of the arith- 
metic mean and m, m2, . . . mg are the magnitudes of the mid-values 
of the q intervals within the range; let e employed as a subscript of d 
and of its associated f denote any given value within the inclusive 
limits —r to r; and let c employed as a subscript of a and of its associ- 
ated p denote any given value within the limits 1 to gq. 


is likely 


















142 American Statistical Association [24 


Then the deviation, X —M, of any observed value from the mean 
may be expressed X —M=a-+d and the square of that deviation as 
a’+2ad+d*. For any interval then, the sum of the squares of the 


deviations 
d=r 


Zatd=flatayYt+falatday+ .. + th_(at+d_,)? 


and this, after expansion of the binomials and collection of terms, 
becomes 


d=r e=r e=r 
2f(a+d)*= pa?+2ax(f.—f-.)d +2 f+ Jd? (2) 


The conventional procedure of treating all the frequencies within the 
interval as though they occurred at the mid-value substitutes for the 
right hand member of (2) the value of its first term only. It is not im- 
possible, of course, that this makeshift should be free of error in a 
given case; for the second and third terms may be numerically equal 
and of opposite sign. It may be worth while to note that if the fre- 
quencies congregate densely between the mid-values and the mean the 
first term becomes large (as compared with the left hand member) and 
the second term becomes numerically large, but negative. When the 
mid-value lies between the mean and the massing of the frequencies the 
first term becomes smaller and the second becomes algebraically larger. 
If the arrangement of the frequencies is symmetrical with respect to the 
mid-value the second term vanishes; for f.—f-. becomes zero for all 
values of e. 

Of course the third term is invariant both to the numerical value and 
to the sign of the deviation of the mid-value from the mean and to the 
sign of the deviations of the observed frequencies from the mid-value. 

Little can be concluded from what may occur in a single interval; for 
the error in the first term of (2), i.e., 2(X —M)?— pa’, in intervals in the 
ascending branch of the curve tends to be offset by the corresponding 
error in intervals in the descending branch. Consideration of the 
single interval merely makes clear certain of the effects of arrangement 
within the interval upon the second and third terms of the right hand 
member of (2) and of the effects of differing distances m— M upon the 
second and third terms of the same member. These effects become 
important when investigation of the distribution as a whole is made. 
For the whole distribution the equation may be written: 


c=q c=g e=r e=r 
no? = Spat+2 | 2a -ZX(fe ~f-d [FBG de (3) 


It will be noted that in the right hand member of (3) the first and third 





















Det 


we wee oe ww ww Fe e 








25] The Formation of Frequency Distributions 143 


terms must always be positive. Hence the only condition in which 
c=q@ 

no? can equal p.a*, is that the middle term of (3) become negative 
c=1 


and equal, numerically, to the third. Consider three types of smooth 
symmetrical curves, one concave, one flat, and one convex, with 
respect to the X axis. Ia a curve of the concave type it will be seen that 
in every interval in which a is positive every product f-,.d-, will be 
exceeded numerically by the product f,.d,. Likewise when a is negative 
every product f-..d-, will be numerically greater than the product 
f..d.. Thus, in all intervals (except where a=0) the product 


e=r c=q 


2a-- =(f.—f-.)d. will be positive. Hence no? >Zp,a*, in concave distri- 


e=0 c=1 
butions. To treat all frequencies in this type of distribution as 
though they occurred at the mid-values of their respective intervals 
would always find too small a value for c. 

The similarly found values of o for a flat distribution (or for one of 
any general shape in which the frequencies are uniformly distributed 
within each interval) will also be too small; for every product f, - d, will 
be cancelled within the interval by the products —f—,-d-,. The error 
in the unadjusted value of o of such distributions is measured by 
the magnitude of the third term of (3) above, instead of by the sum 
of the second and third terms as was found true for concave curves. 

For smooth convex curves, on the other hand, the middle term of 
(3) will be seen, by a corresponding inspection, always to be negative. 
Sheppard has shown ! that for certain high peaked curves of this type 
this second term is numerically greater * than the third. And it is for 
certain curves of this type that he devised his corrections. 

When we consider irregular or complex curves of all degrees of 
asymmetry, multi-modality, etc., the problem of adjustment for the 
grouping error becomes hopelessly complex. The workable problem is 
rather to devise an empirical method of so forming the distribution 
that adjustment is practically unnecessary. It will be shown, for 
example, that even in distributions that appear to come within, or 
closely to approximate, the class with which Sheppard dealt, the 
application of his corrections is often inadmissable; for the distribution, 
regular on its face, may prove upon examination to be periodic within 
the intervals. In the illustrative data employed in the averages 
problem above, for example, there are tendencies of the variates to pile 


1 Article previously cited. 
2 This problem of convergence, i. e., the desinition of the degree of convexity in a curve of any particu- 
lar kind necessary to make the second and third terms of (3) numerically equal, is interesting. But 
since we are here concerned primarily with the irregular distributions existing in nature rather than with 
simple smooth symmetrical curves, the matter of convergence is left aside. 

















144 American Statistical Association [26 








up at several points within the dollar intervals. And when the mid- 
value of the interval is so placed as to throw these periodic peak den- 
sities below the mid-values so that even in the ascending branch of the 
curve as a whole there are more frequencies below, than above, the 
mid-values, it is found that the sum of products, mid-value times the 
squared deviation of mid-value from the mean, is too small rather than 
too large. 

It will not do to say that the “‘little waves riding the back (or the 
trough) of the big wave”’ are chance fluctuations although they may 
be in a particular sample. Anyone familiar with the details of money 
value series knows that this phenomenon appears and persists in popu- 
lations of all magnitudes and that it can confidently be predicted to 
continue. And such behavior within ordinate intervals is just as much 
a statistical fact to be recognized as is the period of the sine curve or the 
inflections of the normal curve of error. 

In the method here proposed of finding that distribution, of given 
interval, which gives a value of o least departing from its true value, 
use is made of the previous work in finding the distribution least dis- 
turbing the mean. Equation (3) supra is also brought in. The ob- 
jection to the method proposed that is most certainly to be anticipated 
is that it islaborious. And that is a proper objection to the method as 
compared to one that achieves a like precision with less labor.! But the 
actual amount of labor required here is not believed to be uneconomical. 
As compared with the amount of care and effort ordinarily bestowed 
upon obtaining nice accuracy in the original data the amount of work 
necessary to safeguard such accuracy as was originally achieved be- 
comes very small indeed. 

Assuming that the true mean has been found in the process of investi- 
gating the effect of the several distributions upon the mean, what re- 
mains to be done is to determine the true value 2(X —M)? and the 7 


c=q 
values of Zp.-a*. To obtain 2(X —M)? select any of the 7 distribu- 


c=1 
tions and find the values of the three terms of equation (3). The 
value of the first term is found by the ordinary process of finding the 
sum of products, frequency times squared deviations of the mid-values 
from the mean. The second and third terms may be found as follows: 
Lay off along the primary distribution a set of limits of intervals of the 
desired common magnitude—say twenty units. Within the limits of 
each interval write, opposite each frequency, the product of the fre- 


1 When one of the motor-driven, ribbon-printing, calculating machines is available the work of cal- 
culating the true value of the standard deviation by the method illustrated—because of the smaller 
numbers involved—becomes less for large values of n than the squaring of all deviations of all variates 
from the mean. 





















VSL1¢ =0 PUO GOOI' 979 = SlE7Z2'SI + BSH2 6 -92Z/'079 = ,OU SI0JBI94M 
‘922109 SI POoUjow (OUOHWUSAUIOD ayy Aq PUNY LWA} soul) SUL “LW49} PUODES OU} SI SIUL “gor2’6- 
= 02=Z9I6 PBI— OZ AQ spiAIp *|OAUAjUI UD)JOP By JO SHUN Of SIUy 4J4BAUO? OC} * ZIIB PBI—- =IE6r' x IG| XZ-9E- 
(ariyisod st W-y PU anlyoBausipsZ -aJON) PJZ(W-)Z PPO VOYNqLysIP ayy JO UDA auf O} [PJZ(y-W)Z)}—z YSAfpo OL 
“WJ2} P4lYyf, BYE JO. ONIDA OU} SI SIU) *S/E22°C) 
= 00” + $6809 “2! Aq sequunu SsiyjapIAIp [DAJaJUI UOIJOP SUJ JO SHUN O} SIUZ JU@AUOD OF *C'EBOI =,S°* Ol2—- 219 
ZVI9 WOy 2(\7-W)U JONPEp Sjorsajuy OY} JO SEN|OAPIW BUY Of (J%_P)Z ISNKpo o; 
ivi- | OL— | OL- o ee- | 29- | L92 Si-_ | Psxlv-“W2 
9 v 2 ° 2 v- Q- Ol- (v-4)2 
S¢2- 6£- 6| 6l- Sci Srbr- Ss‘ P3Z 
Sb- | So L2- iz- | oel- | Gor ; S- 2/A- 
gi- Sb- Si- 6- 6- 6- 
fe} fe) fe) g- 
L- vi- Se- 
fe) 2l- o 
G- oO Oo 
ie) 2l- 8- 
6- 
O 
ie) 





S6- 
Ol 
S6é- 


re) 
. 
































¢o~ e~ 
oe oe 
l- 0 








O] S/OJO/o}o/o}o}|M|" 
C/A/O/F/O] P| ~/o 
O/BIA/O]o/o/olo/o 

















{19 }09]—]O] rll [Pl Ply 





a) 
= 
a) 
= 
= 
oS 
~ 
nn 
~~ 
2 
~ 
Q 
; 4 
& 
= 
oa 
8) 
= 
ics 
> 
id 
S& 
= 
5 
S 
Re 
i) 
— 
BS 

















Oo] O] O] OJ C/O} OJO/ 0/0 


O}Q) O/ r-} ow 








0 
Qlo}o}o} of ojo} Ofo/o/O} | a/ofojo}ojo}o}o/® 9} g)|@ 
t 





D/O] CO] CO] O}. O/H] O}/M/ UO 
| O] 2} O} ~] ©} 0] O} 19] cu} — 
t]O] O] ©] 5] B/S] O}ro} +] 0 


—_ 
— 


S 


:SJOAUajU; Ul P-f jo SyoNpoud 
(¢) NOLLvndo”" dO SWHA] CGYIH] GNV ANoodag sO NOlLwWINd Tv) 
Il WIAVL 


























IND/O]O} CO] OJ MIO] CO] D/O} cu 
NIOLO/O/ CO] O/C] O}M/O!u 
—11/O}O/ 0] 0] O10] OO] NO 


f 
ne) 


























146 American Statistical Association [28 


quency times its deviation from the mid-value (or where more conven- 
ient, its deviation from an arbitrary point of origin). Selecting, for 
purposes of illustration, that distribution of twenty-unit (one-dollar) 
intervals the mid-values of the intervals of which fall at scale values 
ending in $0.475, Table II is set up to show the process. 

Then, by the ordinary method, find the values, for each of the dis- 


c=q 


tributions, of the first term, > (p.a*.). Differences between these 
1 


values and that of no? determine which of the several distributions will 
give an adjusted standard deviation of least grouping error. The 
precise magnitude of the error is also made plain. 

For the data employed here the range of error in the standard devia- 
tion (dollar grouping) is from —.064 to .022, or .086 units, that is 
$0.43. And the range of relative error is nearly five per cent of the 
true value. The minimum numerical error, on the other hand, is 
found to be $.003, or .0017 of the true value. It happens also that the 
distribution to which the least error attaches (the one in which mid- 
values fall at $X.425) disturbs the mean but slightly—the absolute 
error being .0045 and the relative error being .0001. 

Systematic test of the method discussed will seldom fail to find some 
distribution (even of large interval) for which the grouping error in the 
first and second moments is too small to be of practical importance. 

The distribution dealt with here only apparently admits the employ- 
ment of Sheppard’s correction. The variable is practically continuous; 
for, although the record is at intervals of five cents, that interval is 
small as compared to the minimum variates. The conformation of 
any of the dollar groupings, once formed, comes within the type of 
curve with which he dealt. The reason for its inappropriateness is the 
periodicity found within the intervals. But one who deals with any 
one of the distributions, without scrutinizing the original array, cannot 
know of the existence of this phenomenon and, if he proceeds to apply 
the correction, will have a large chance of finding an adjusted value fur- 
ther in error than his unadjusted value. Table III is set up to show the 
results of incautious use of Sheppard’s formula. 

Measures of skewness that involve the use of the mode are notori- 
ously unreliable because of the unstable and indeterminate character of 
that average. And when, as illustrated, the values of the mean and of 
the standard deviation are materially affected by the grouping it can 
readily be seen that measures of skewness may become wholly erratic. 
If, for each of the twenty dollar-groupings, the mode is found by the 
common method of inverse proportional interpolation the value of the 
mode fluctuates from $27.878 to $27.087. The other eighteen values 





29] The Formation of Frequency Distributions 147 


are almost un?formly distributed between these limits. The corre- 

sponding range of the measure of skewness, ~~ mean— mode — is from 

—.464 to —.072. If Sheppard’s correction is first applied to the 

standard deviation the corresponding range is from —.469 to —.073. 
TABLE III 


Table Showing the Results of Applying Sheppards Corrections 
(His formula 30) to the Unadjusted Values of the Second Moments about 
the Mean of the 20 Dollar-Groupings. (o* denotes true value, V, denotes 
fhe unadjusted value, and ML, denotes the adjusted value) 











Midvaluve of 
First Interval 
= $X.+ 


\= 


U.= 


oF V= 


of— Up= 





025 


5.305 


3221 


—.228 


-.145 * 





O75 


5.286 


3.202 


- .209 


~.126 % 





125 


3.232 


3.149 


—.156 


~-.072 





175 


3.216 


SASS 


-.139 


—.056 





225 


5.184 


3.101 


—.107 


-024 





2715 


3.227 


3.143 


-.150 


-.067 





025 


3.206 


3.122 


-.129 


—.046 





75 


35.199 


3.116 


~ i. 


-.039 





A25 


5.066 


2.983 


O10 + 


094 ¥ 





475 


5.048 


2.965 


028+ 


2% 





we5 


3,000 


2.917 


O76 + 


160 * 





075 


3.000 


2.917 


.076 + 


160 ¥ 





625 


3.164 


3.08! 


-.088 


~.004 t 





675 


3.1SI 


35.048 


—.055 


.029 





725 


3.098 


3.014 


-O2!1 + 


062 





775 


3.181 


3.098 


-.105 


~.02I 





825 


3.170 


3.087 


~-.093 


-.010 





875 


35.204 


3.120 


-127 


-.044 





925 


3.235 


3.15! 


-158 


-075 





975 





3.226 





3.143 








—.150 


-.066 








+ In these five instances (one in four) the unadjusted value is nearer 
correct than the adjusted value. 
t In only one instance,and then by a slight margin, is the adjusted value 
nearer correct than the unadjusted valve of the best distribution. 
% In six instances the adjusted value is in error by more than the correction 
factor used. 
Solution of the usual measure of reliability O, =\aitein the distrib- 
ution having its interval midvalues at $X.425 gives a result of .083. 
It is interesting to note that the error due to grouping is, even in this 
small sample, almost.as large in several of the distriloutions as the stan- 
dard error of sampling. To calculate the latter measure and affix it fo 
o without ascertaining the grouping error is to give the reader a 
Spurious assurance of reliability 





148 American Statistical Association [30 


If the X value of the maximum ordinate is found by Pearson’s 
method ! in order to use his more refined measure of skewness it will 
quickly be found that values of the mode appear to lie between the 
actual mean and median! And this despite the fact that these true 
values are separated by $0.135 or .077c. 

If the value of the mode be found by the rough formula, mode 
=mean—3(mean—median), the numerical value is $27.385 and the 
value of the skew is —.231. The values of the mode and of the skew 
for that distribution (see page 146) which least disturbed the value 
of the second moment (and affected the first moment only slightly) 
are $27.387 and —.234, the mode being found by interpolation. 
For series like the one dealt with here, then, it becomes plain that the 
effect of chance location of the interval limits, upon measures of skew- 
ness, is much greater than the effect upon the more important con- 
stants. 

The foregoing non-mathematical opening of this problem in the 
theory of measurement is not put forward as a conclusive treatment of 
the subject. The author wishes rather to make it plain that there 
are many interesting and important matters to be considered in 
statistical grouping. It is hoped that students of statistics will take 
them up for conclusive mathematical treatment. 


1 For a brief treatment of this, see Frederick C. Mill’s Statistical Methods p. 546. 











Forecasting the Price of Wheat 


FORECASTING THE PRICE OF WHEAT 


By C. C. Bosnanp, University of Michigan 


I 


Forecasting of prices involves knowledge of the forces or factors that 
cause prices to change. An attempt to forecast price must be based 
on the knowledge of these factors and the importance of their effect 
upon price. This study is undertaken with the aim of determining, 
by statistical measurement, what factors cause the price of wheat to 
be what it is, and to what extent variations in these factors cause 
corresponding changes in price. 

It is evident that the problem is complicated by the wide area in 
which these price-making forces exert their influence. A study of 
wheat prices must be a study that is world wide in its scope. The 
validity of the results of any statistical investigation depends upon the 
adequacy and accuracy of the figures used. This causes an analysis 
based upon world figures, which are collected with varying degrees of 
accuracy, to be subject to an unknown degree of error. However, in 
the main, perhaps statistical data represent what has actually taken 


place as well as any information that is available and an investigation 
based upon them may prove to be of some scientific value and perhaps 
of practical value to the farmer, speculator, miller, manufacturer, 
and retail dealer, who can profitably make economic adjustments 
consistent with the predicted future. 


II 


Beginning with the economist’s statement that price is determined 
by conditions of supply and demand, the problem resolves itself into 
one of determining the forces that cause variations in these conditions. 
Since the problem is undertaken with the aim of forecasting the price 
in the United States, and since the United States is the largest wheat- 
producing country in the world, though it may seem to some that 
undue emphasis is laid upon conditions in this country, this will make 
no great difference in the results obtained. 

One of the important factors affecting supply is world production. 
This includes the actual amount produced and the amount that is 
expected to come on the market in the near future. Wheat is not an 
extremely perishable product, and buyers and sellers investigate prob- 
able future supply and probable future price before they decide upon 











150 American Statistical Association [32 


the price at which they will buy or sell. The conditions affecting 
present and prospective production are: First, acreage planted 10 wheat, 
which is dependent upon existing facilities for wheat production, its 
place in crop rotation and the relative profitability of the wheat crop 
as compared with other crops that could be planted. Second, the 
condition of the crop or yield, dependent upon climatic conditions, 
and diseases and insect pests, is important. Consideration must also 
be given to the time that elapses before this crop will be placed on the 
market and the possibility of changes in volume due to changes in 
conditions affecting the growing crop. 

Carry-over from previous years is also an important supply factor 
which must be taken into consideration in explaining variations in the 
price of wheat. This includes the stocks of wheat in terminal and 
country elevators, stocks on farms, and supplies in the form of flour. 

On the demand side, the use of wheat as human food is obviously 
the predominant factor. The volume of consumption is dependent 
upon many conditions, the most important of which are probably the 
incomes and purchasing power of all classes; the custom of consuming 
certain wheat products; and the substitutes available and their prices. 

If correct statistical information on all of these factors were available 
the problem of analyzing price would be a more exact and satisfying 
one. Unfortunately such is not the case. In the first place, it is 
extremely difficult, if not practically impossible to obtain a direct 
statistical measure of demand. Demand is essentially an individual 
phenomenon, subject to infinite variation as are the likes, dislikes, 
incomes, and whims of human beings. This difficulty may be obviated 
in some price studies by assuming that demand is fairly constant, in- 
creasing at about the same rate that population increases. This is 
probably most applicable to staple food products. It is not, however, 
a reliable assumption when one food product, rather than food products 
in general, is under consideration, as it seems reasonable to assume 
that there is substitution of one food product for another when prices 
of these make substitution profitable. 

Another method of obviating the difficulty of securing a measure of 
demand is that of explaining a changing relationship of supply to price 
by change in demand. Thus, in the case of world production and 
carry-over in relation to purchasing power of wheat, it is found that 
“purchasing power” (price divided by price level) tends to remain 
on the same level while there is a distinct upward trend in production 
and carry-over. From this it can be deduced that the trend of pro- 
duction and carry-over represents a curve of increasing demand, for if 
this were not true, purchasing power would not remain on a level with 














ng 
at, 
its 
op 
he 
ns, 
Iso 


ly 


le 





| 
| 
) 
| 


eee 








Forecasting the Price of Wheat 151 


decline. This method of determining demand is, however, indirect, 
depending upon supply and price figures, and is therefore of little 
direct value as a separate factor. 

Because of the inadequacy of data on the side of demand the analysis 
must be based largely upon factors influencing the supply of wheat. 

On the side of supply, the difficulty of obtaining statistical informa- 
tion is also serious. Figures indicating probable future conditions of 
supply are practically impossible to obtain. The International In- 
stitute of Agriculture has taken steps to remedy this difficulty, but its 
figures are not available for a sufficiently long period of time for this 
analysis. Furthermore statistical information as to what actually has 
taken place is subject to some degree of error, as was previously pointed 
out. However, some of the supply series, even though they are not 
precisely accurate, are sufficiently accurate for the purpose of explain- 
ing price, and form perhaps the best basis upon which forecasts of 
price changes can be made. 

Assuming that demand remains relatively constant or changes 
uniformly, changes in price will come about through changes in the 
volume of supply. If the supply for any year is double the amount of 
supply for the preceding year, we would expect the price to fall, or 
if there was a decrease in supply we would expect the opposite change 
in price to take place. But how much will the price vary with a given 
change in the volume of supply? Here statistical methods must be 
applied in order first, to obtain a numerical statement of the relation- 
ship between price and supply, and, second, to derive a formula by 
which changes in price can be predicted by knowledge of changes in 
supply. The first is expressed by the coefficient of correlation, and the 
second by the regression equation. 

For the present study, the years 1896-1921 have been taken because 
information for these years seems to be superior both as to volume and 
accuracy. Furthermore, the results will be comparable with results 
to be obtained at the present time, due to a substantial similarity of 
conditions. 

A method of forecasting the yearly average price will be attempted, 
leaving fluctuations within the year for further study. 


III 


The first problem is that of the price to use in the analysis. It would 
seem to make little difference what price is used so long as it is used 
consistently, for if there were no interfering conditions the prices in 
the central markets of the world should differ only by the cost of 
transportation. This was found to be substantially true. Through 












152 American Statistical Association [34 


a correlation of the Chicago price with prices in Winnipeg, Minneapolis, 
and Liverpool, it was found that the prices in these central markets 
varied with a high degree of harmony, indicating that the Chicago 
price may be used as representative of the world price.' 

Because of the fact that about 80 per cent of the world’s yearly 
wheat crop is harvested during and following the month of June, the 
production for any year begins to exert its influence on price in June 
and continues to be felt until the next year’s crop is forthcoming. 
Thus crop-year price rather than calendar-year price was used. This 
crop-year price is determined by an average of monthly highs and lows 
for cash wheat in the Chicago Market from June of the year in question 
to the following May. 

Price, however, must be still further refined in order that the relation- 
ship between the factors in question and price can be understood. 
This necessity arises from the changing price level. Thus, production 
may remain the same during a period of time, but price may rise 
because the general price level has gone up, due to the decreased pur- 
chasing power of the monetary unit. To take account of this, the 
Chicago price is divided by the “ All Commodities” index number of 
the Bureau of Labor Statistics, figured on a crop year basis. Thus 
“‘purchasing-power”’ (reduced price) rather than actual price will be 
used, although the word “ price’’ will hereafter designate “‘ purchasing- 
power”’ or actual price divided by the index number. 


IV 


Turning now to an investigation of the relationship between price 
and supply, what degree of relationship can be said to exist between 
supply factors and the price of wheat? It was noted that supply 
should include the amounts expected to become available in the near 
future as well as the amount actually in existence. However, due to the 
lack of information regarding anticipated supply, attention must be 
confined largely to the present stock on hand. 

The most important factor affecting the supply of wheat for any 
year is world production, and price should be expected to vary in- 
versely as this amount is large or small. But a single year’s produc- 
tion is not the entire explanation of supply, as the amount carried 
over from previous years is, of course, as effective in determining 
price (in accordance with its quantity) as is the amount coming on to 
the market during the year in question. Since the crop year used in 


1 The coefficient of correlation between the Chicago price and Winnipeg price, and between the 
Chicago price and Minneapolis price was approximately +.95, while that of Chicago and Liverpool was 
found to be +.80. The change in freight rates during the period under consideration is partly responsi- 
ble for the lower coefficient of correlation in the latter case. 





35] Forecasting the Price of Wheat 153 


this analysis begins on June first, the world visible supply on that date 
will be taken as indicative of the carry-over from previous years. 

What then is the relationship between this supply factor for a given 
year and price for that year? By fitting trends to the production plus 
world-visible-sup »ly-on-June-first figures and to the prices figures, and 
correlating percentage deviations from trend of production plus carry- 
over with percentage deviations of price from trend, for the years 
1896 to 1921, a coefficient of correlation of —0.57 is obtained. The 
standard deviation of price is .13 (13 per cent) and that of production 
is .06 (6 per cent). This indicates that there is a significant inverse 
relationship between price and production plus carry-over, but the 
relatively low coefficient of correlation indicates that the relationship 
between price and this supply factor is not definite enough so that 
changes in price can be forecast from knowledge of changes in this 
factor with any degree of accuracy. 


TABLE I 
DATA USED IN CORRELATION ANALYSIS 








“ . = . Imports Principal 
Purchasing Power| World Production |U. 8. Acreage} U.S. |Importing Counnries 
(Price+Index)! | Plus Carry Over | Harvested | Exports (Wheat and Flour) 
bu.? 000 Acres* | 000 bu.‘ 000 bu.§ 





145,125 


— te 


131,741 
135,966 
149,852 


023 








i ella cue cell alll cll el — et et et 

















1Price—Chicago Board of Trade Yearbook, 1923. 
Index—Bureau of Labor Statistics (U. S.), Monthly Labor Review. 
* World-Visible-Supply-on-June-1—United States Department of Agriculture, Yearbook, 1922, p. 607. 
World Production—United States Department of Agriculture, Yearbook, 1922, p. 582. 
*United States Acreage Harvested—United States Department of Agriculture, Yearbook, 1922, p. 


‘United States Exports—United States Department of Agriculture, Yearbook, 1922, p. 583. 
‘International Yearbook of Agricultural Statistics, 1909-1921, Table 118. 
The inability of world production plus carry-over fully to explain 
price indicates that there are probably other supply factors that must 





154 American Statistical Association [36 


receive consideration. It might appear that any supply factor would 
be of influence only in so far as it made the actual visible supply (or 
potential supply) large or small, and would therefore bear no independ- 
ent relation to price. On the other hand, factors such as acreage 
harvested and exports might possess an individual relationship to 
price, due to the fact that the effect of each on price might not be 
measured through world production plus carry-over. A determination 
of the separate influence of these suppiy factors might help to explain 
price. 

The United States is the world’s largest producer of wheat and certain 
conditions in this country might be expected to be effective in determin- 
ing the Chicago price. The yearly acreage harvested here might be an 
influence affecting buyers’ and sellers’ attitudes outside of its influence 
on world production which is later realized. The same method used 
in determining the relationship between price and production plus 
visible-supply-on-June-first, is used in determining the relationship 
between United States acreage and prices. Percentage deviations 
from trend are correlated. The degree of relationship between this 
factor and price is measured by the coefficient of correlation of —.20, 
indicating that there is very little relationship between acreage har- 
vested in this country and price. 

Another supply factor which might be expected to influence the 
Chicago price, is the export of wheat from the United States. Since 
the price of wheat is set in a market that is world wide and since the 
United States contributes to that market in the form of exports, the 
amount of these exports may be of distinct significance in determining 
price. Percentage deviations of exports from trend are correlated with 
percentage deviations of price from trend. The result is a coefficient 
of correlation of —.112. This indicates little or no relationship between 
United States exports and price. 

Thus far factors affecting the supply of wheat have been the subject 
of this study. Attention will now be given to a supposed measure of 
the demand for wheat—imports into the principal wheat importing 
countries of the world. Demand, of course, is made up not only of 
the demand of importing countries, that of exporting countries also 
is undoubtedly an influencing factor in determining price, but this 
demand is of influence only in so far as it affects the quantity available 
for exportation. Thus the price that will be paid by the importing 
countries for this surplus sets a limit above which the price in any 
exporting country (costs of transportation, tariffs, etc. being ignored) 
cannot go, and this demand is the one that is most significant in the 
determination of price. Net imports into Great Britain, and Ireland, 





37] Forecasting the Price of Wheat 155 


Germany, Italy, Belgium, Netherlands, France, Switzerland, Austria, 
Brazil, and Egypt go to make up this total. Information concerning 
these imports is available only as far back as 1909 and the number of 
observations are therefore too few for an accurate determination of the 
relationship. The correlation of percentage deviations of imports 
from trend and of percentage deviations of price from trend results in a 
coefficient of correlation of —.41. This indicates that these importing 
countries tend to be price sensitive and perhaps make up the most 
elastic element in the demand curve for wheat. If there is a large 
surplus of production in the surplus producing countries, the price will 
tend to be low because the surplus can be disposed of only at lower 
prices. The coefficient of correlation is too low and the observations 
too few to be of much value in explaining price. 

The degrees of relationship between world supply and carry-over, 
United States acreage harvested, United States exports, and imports 
into the principal importing countries of the world have been deter- 
mined. If each of these has a distinct relationship to price, that is, a 
relationship which it alone has to price, it would appear that they to- 
gether would go far toward the explanation of changes in price. The 
independent influence of each factor, and that of all factors taken 
together will be tested by a partial and multiple correlation analysis. 
In this analysis, one factor, imports into the principal importing coun- 
tries of the world, will be disregarded because of the fact that figures 
for this factor are available for a relatively short period and are there- 
fore not comparable to the figures for the other factors. The multiple 
correlation analysis of the relationship of the remaining three factors to 
price results in a coefficient of multiple correlation of .59 which is but 
little higher than the original correlation between world production 
plus carry-over and price (—.57). One is forced to the conclusion 
that the factors do not exert a separate influence on price but each is 
measured through the other or all are measured through one. Through 
a partial correlation analysis this factor is found to be world production 
plus carry-over (See Table II). The influences of United States 
acreage harvested and United States exports are measured through 
world production plus carry-over. They affect price only in so far 
as they affect or are affected by this factor. 

It is evident that a coefficient of multiple correlation of .59 is too low 
to be of any great value in forecasting the price of wheat. It indicates 
that world production plus carry-over is an important factor in the 
determination of price, but percentage deviations of this factor from 
trend do not seem to be closely enough associated with percentage 
deviations or price so that it alone explains changes in price or is of 





156 American Statistical Association 


value alone in the forecasting of price. Other factors must be found or 
a different method must be applied to this factor in attempting a 
further explanation of price changes. 
VI 
It will be remembered that the period under consideration in- 
cluded the abnormal war years. Changing conditions should be taken 
into account in any analysis but it seems justifiable that allowances 
should be made for abnormalities. Government rationing, price 
guarantees, the feeling of patriotism in time of stress, and the de- 
pressed condition of wheat consuming areas due to war conditions, 
cause an abnormal situation in the supply of and the demand for wheat. 
Furthermore the withdrawal of Russia, as an important wheat produc- 
ing country, in 1915, and the erratic fluctuations of production and 


TABLE II 
RESULTS OF CORRELATION ANALYSES 








Variables 
‘ : Average Price = 
1. Purchasing Power pepe pees (Xi) 
2. World Production plus Carry-Over (X2) 
3. United States Acreage Harvested (X3) 
4. United States Exports (X4) 





Coefficients of Correlation 
Zero Order 


riz = —.57 
nz =—.20 
m4 =—.11 


ra = +.37 
T% = —.03 
ru = +.55 





First Order 


ri2.3 = —.54 
niz4 = —.58 
ri3.2= +.001 
M42. 
ru4.2= —.16 
ris.3 = —.002 


r2.1= +.32 
r3.4= +.46 
ra.a =—.12 
ru.s = —.30 
rua = +.54 
ru.2 = +.60 





Second Order 


riz.u = +.14 
r14.3 = —.19 


ra.u=+.45 
724.13 = —.36 
ruaiz=+.61 





Multiple Correlation 
Ri (234) = .5902 





Multiple Regression Equation 


Xi = —.0214—1.176 X2+.181 Xs—.064 Xs 








price level during the war make it difficult to fit a trend that would 


tend to prevail under less unusual circumstances. In light of these 
difficulties it seems permissible to disregard the war years, in an effort 
to determine the relationship between world production and carry- 
over, and other factors to price, and to obtain a forecasting formula 
that will work under more normal conditions. 





39] Forecasting the Price of Wheat 157 


Not only will war years be left out of consideration but a refinement 
of method may prove to be of advantage. In correlating percentage 
deviations of supply with percentage deviations of price, a single item 
of supply is correlated with a single item of price, leaving out of account 
the relationship of that item of supply or price to previous or subse- 
quent items (except in so far as other observations are of influence in 
the determination of trend). This relationship to previous or subse- 
quent observations is probably of extreme importance. The change in 
supply from one year to the next is perhaps of greater significance to 
a change in price than is the absolute magnitude of supply to the 
absolute magnitude of price. The change in conditions from year to 
year shauld be of more importance than simply the actual condition in 
any year. To take account of this relationship between changes, first 
differences of percentage deviations from trend rather than simply 
deviations from trend for both world production plus carry-over and 
price were used. The correlation between these series for the pre- 
war years resulted in a coefficient correlation of —.74, which is appre- 
ciably higher than the correlation of percentage deviations for the 
entire period. (See Chart I)! 


Purchasing 
wees Purchasing Power (Delines) 


+++ Supply 


. Tr,” 


1.50 


. VTveeee 


























—— - ———_ 























1910 


CHART I—WHEAT 


1 This method gives better results for the pre-war years than does the use of percentage deviations from 
trend. 





American Statistical Association 


«= Actual Price 
(Dollars) @e Forecast Price 


2.50 











{1 


ered 
-_. “ 
Reo? 





























1905 1910 1915 
CHART II—WHEAT PRICES 


What other factors can be taken into consideration which will 
further help in explaining price and raising this coefficient of correla- 
tion? Perhaps one of importance is the anticipated supply that will 
be forthcoming with the next year’s crop. Here again somewhat 
arbitrary methods must be used because of lack of information. Be- 
cause of the fact that anticipated change in production can have a 
noticeable effect on prices only for the last few months of the crop year, 
one-sixth of the first difference of the following year’s production was 
added to the first difference of the year in question, and this total was 
correlated with price. This, of course, assumes that the realized 
production for the following year represents the anticipated produc- 
tion, which, of course, it cannot exactly do, as there are changes during 
the growing season which cause realized production to deviate from 
anticipated production. However, it does give some indication of 
anticipated changes in conditions. The resultant coefficient of 
correlation was —.68, indicating that this does not help to explain 
price, and is of no value in forecasting. 

The price of substitutes for wheat presented another possible 
opportunity for the explanation of wheat prices, and the price of the 
most important substitute, rye, was investigated. There seemed 
to be little definite relationship between the price of rye and the price 
of wheat. Furthermore, changes in world production of rye did not 
seem to explain the divergence in the years when price and production 
of wheat were the most remote from their normal relationship. Thus 





41] Forecasting the Price of Wheat 159 


in 1903 and 1904, when the price of wheat was much higher than 
supply conditions seemed to warrant, the production of rye was much 
above its normal trend. In 1905 and 1906, when the price of wheat 
was much lower than supply conditions seemed to warrant, the pro- 
duction of rye was much below its normal trend. It appears that 
this factor can explain little. 

Of the factors so far studied (with the exception of imports into the 
principal importing countries, which factor was not further considered 
because of the short period of time for which the figures are available) 
world production plus carry-over seems to be of most significance in 
explaining changes in price. Its relationship to price appears to be 
definite enough so that changes in price can be foretold with some de- 
gree of accuracy. A regression equation of price on production was 
calculated and this was used as a forecasting formula. This regression 
equation, stated in terms of first differences of percentage deviations 
from trend is: 


X, (Price) = .01648—1.4964 X, (Production+Carry-Over). (See Chart III.) 


e Years for Which Regression 
Was Determined 
® Later Years 





=] 
x 
B 
= 
oc 
s 
Q 
c 
* 
e 
a) 
J 
2 
$ 
S 
= 
2 
_ 
- 
° 
+ 
© 
= 
o 
os 
© 
= 
Q 
_— 
~ 
2 
p- 
, 
M 
2 
. 
me 
a 
& 
Q 
8 
= 
° 
- 
= 
= 











20 15 10 5 0 5 10 + 15 20 


Production Plus Carry-Over—First Difference of Percentage Deviations from Trend 


CHART III—WHEAT 
REGRESSION OF PRICE ON PRODUCTION 

































160 American Statistical Association [42 





This equation stated in actual price and supply figures is:! 











Price (this year) Trend of pri Price (prev. year) Trend of Pri 
— Tre rice ~ ric 
Price Index (this year) P - Price Index (prev. year) ¥ 
Trend of Price Trend of Price 


Prod.+C.0. (this year) —Trend of Prod.+C.0O, 
Trend of Prod.+C.0. , 
Prod.+C.O. (Prev. year) —Trend of Prod.+C.O. 
Trend of Prod.+C.O. 





=. 1648-14004 





To test out this equation forecast prices were compared with 
actual prices, not only for the period under consideration (1897-1914) 
but for the years 1897 to 1923. The results are recorded in Table 
III and Chart II. It will be seen that the years 1899, 1903, 1904, 
1905, 1917 and 1918 show a relatively wide discrepancy between 
forecast and actual price, which, of course, must be expected due to 
a negative correlation of only .74. However, many other years show 
a close agreement between actual and forecast price. It is interest- 
ing to note that the formula, derived on the basis of conditions which 
prevailed from 1897-1914, is as accurate in its applicability to present 
conditions as in the period from which it was derived; the forecasts for 
1921 and 1922, especially, being not far divergent from the price that 











TABLE III 
ERRORS IN PRICE FORECASTS 

Year Actual Price so Error h.. 
a whos an das wi Gig avai eine se $.98 $.90 —.08 —.089 
i eee ean eawkewendawas .73 -76 +.03 +.04 
Te a ae ai ol ane and .69 .90 +.21 +.23 
i iat a an Weak eine e eas mae a .74 .83 + .09 +.11 
ee ee mate al ain .73 .76 + .03 + .04 
a al alo a in alk a ie ba 75 .68 —.07 —.10 
SR cde aimiehand ava durde-ee wean desea dws . 87 .76 —.11 —.14 
neelaih nt a bdibad habe ae kee een 1.10 .93 —.17 —.18 
alee ira ord oe uid nl i kan ae ae 1.13 .90 —.23 —.25 
Ra i a diene eee MAO 7 .95 +.17 +.18 
tie wa i an bake hbk Oo ee wa .99 .96 —.03 —.03 
a a od lta owl irae mee eae eeutaen 1.12 1.05 —.07 — .067 
inns Ce eee weaken eee aealed 1.18 1.05 —.13 —.12 
SSRI RE Melee eee ye eau 1.02 1.21 +.19 +.16 
AERA ERE RSA SES Se rae ree Sart 1.04 1.07 + .03 + .027 
Re ie tke ede beet kae aan e eee 1.02 1.01 —.01 0.01 
NE er ere .93 .88 —.04 —.045 
ESE, OR tree eee ee 1.26 1.20 —.06 —.05 
EFI ES ee ies ee eran 1.19 1.17 —.02 —.017 
Se etna tad ane n awkedn hhewenwae .80 1.75 —.05 —.03 
DP eucvncecetas ddan eenamhawaheneed 2.30 2.71 +.41 +.15 
ES eee SGC belts eet ec un kd ae en 2.35 1.96 —.39 —.20 
iets Garda nant meee eames Rane 2.64 2.67 +.03 +.011 
a aa ce iad a fa rar Sees 2.08 2.41 + .33 +.14 
Aiea cocak ieee saw ke deuaniea 1.38 1.37 —.01 —.008 
ails ace Stahl ai tind bed athe Gaede 1.25 1.24 0.01 —.008 
SE c wh biota aah tt diedekiba eae arena all 1.16 1.22 —.06 + .052 




















1 Standard error of estimate =.121 



















BEI 


rte naar" NPN 











43] Forecasting the Price of Wheat 161 


actually prevailed. The forecast price for 1921 was $1.37 while the 
actual price was $1.38. For 1922 the forecast price was $1.24 while 
the actual price was $1.25. For 1923 there was a difference of about 
$.06, forecast price being $1.22 while actual price was about $1.16. 
The standard error of estimate for the five post-war years was 6.3 
per cent as against the standard error for pre-war years of 12.5 per cent 
and that of the entire period, 1897-1923, of 11.4 per cent.’ 

In conclusion, it might be said that the coefficient of correlation of 
— .74 indicates either that all factors explaining changes in price have 
not been discovered, or that the figures used did not accurately repre- 
sent what actually has taken place. The former is perhaps the more 
important reason. A measure of change in demand should, if possible, 
be found. An index of the economic condition of the principal wheat 
importing countries would undoubtedly help, if such could be obtained. 
Some further consideration of substitutes available and probable future 
supply would be of value in explaining price, especially from the 
economist’s point of view. They probably would be of much less 
value in forecasting, due to the fact that information concerning them 
would be forthcoming at too late a date. It will be noted that the 
information necessary for the existing forecasting formula can be ob- 
tained so that a forecast can be made in early October. 

The forecasting formula that has been derived, it must be admitted, 
cannot accurately foretell changes in price by knowledge of changes 
in world production plus carry-over. (Forecast prices vary on the 
average, 11.7 per cent from the actual price.) It is based on an in- 
complete explanation of forces causing price to change, and is based, 
as is any statistically-derived forecasting formula, on the assumption 
that the relationships which prevailed in the past will continue to 
prevail in the future. Decisions based upon its results should be 
modified by liberal use of common sense in the consideration of condi- 
tions other than those included in the formula. It is hoped that further 
investigation will disclose additional factors that influence the price of 
wheat, and that this formula will soon be superseded by one which can 
more accurately foretell the future through knowledge of the past and 
present. 

1The standard error of estimate for the war years was practically the same (12.6 per cent) as that of 


yre-war years, due to the fact that the small errors for the years 1915 and 1916 counterbalance the 
ts of the years 1917 and 1918. 










American Statistical Association 


A FREQUENCY CURVE ADAPTED TO VARIATION IN 
PERCENTAGE OCCURRENCE 


By Sewaut Wriaut, University of Chicago! 












Frequency curves are used for at least two rather distinct purposes. 
One of these is to obtain the most accurate possible description of a 
body of data, smoothing out only those irregularities which may be due 
to chance. The use of curves for the gradation of life insurance data is 
an example. For such purposes, it is of little importance whether the 
formula used has a logical basis or is merely one that has been found 
empirically to give a good fit. The number of constants is also rela- 
tively immaterial. 

The second type of use is in connection with analysis of the nature 
of the factors determining variation. A curve with a simple logical 
basis and a small number of constants is desired. Close accuracy of fit 
is relatively unimportant. Flexibility is obtained by the choice of the 
type of curve instead of by a large number of constants and the type of 
curve is intended to reflect directly an hypothesis as to the nature of 
the variation. 

The use of curves of the type 


y=at+br+cer+d4 — — — 





























to fit all distributions is an extreme illustration of a system of obtaining 
a representation of data by means of an indefinite number of con- 
stants, regardless of logical suitability. 

The starting point in obtaining a logical basis for applying formulae 
to a very large class of frequency distributions is the assumption that 
the variation may be treated as determined by the combination of the 
effects of more or less independent elementary factors. It is well 
known that even if such factors vary rather greatly in their chanc: 3 of 
occurrence, in their contributions to the total effect, or in correl 
in occurrence, the distribution in general shows a close approach te 
normal probability curve 








N 1fz—-Z]2 
y= e~2 g ] 
OV 2r 
1 Written as a contribution from the Bureau of Animal Industry, United States Departn:n' of f 


Agriculture. 
2 Compare E. B. Wilson, “The Development of a Frequency Function and Some Comme: ‘* 


Curve Fitting,”” Proceedings of the National Academy of Sciences, 10:79-84, 1924. 








[44 


9 


ag,2 
fa 
lue 
1 is 


nd 
la- 


ire 
eal 
fit 
he 


of 


ve 
ut 
e 
ll 
of 








45] Frequency Curve and Variation in Percentage Occurrence 163 


The Pearson and Gram-Charlier curves have a broad logical basis 
which adapts them well for fitting distributions which depart from 
normality for reasons of the kind indicated above. As general systems, 
however, their flexibility is in many cases merely that which goes with 
four or more constants. 

We have assumed above that a given elementary factor always 
makes the same contribution to the total effect. There is often reason 
to believe, however, that in natural variation, the effect of one factor 
may depend on the presence and effect of other factors and especially 
on the total effect of the latter. In other words, the scale on which the 
variates are measured may not be in harmony with the nature of the 
factors. This type of complication is likely to produce wider depar- 
tures from normality than those due to differences among the factors 
in chance of occurrence, amount of contribution, or correlation in 
occurrence. 

The logical method of dealing with variation due to such factors is, 
as has frequently been pointed out, to transform the scale to one on 
which each factor has the same effect throughout the range. Practi- 
cally, the method is usually to find the transformation which will 
reduce the family of distributions which is under study, to normal 
distributions. The general theory is as follows.’ 

Let y=f(x) be the observed distribution. 

r-pypy-_)_,3[/-P 

Let y' =f'(2") a e 8 
be the normal curve derived from transformation from scale z to 
scale x. 





Let = = o(2) represent the effect of an elementary factor on 
x 


scale z relative to its uniform effect on scale z'. 


Then z!= / = = F(x) represents the required transformation. 
g(x 


As the frequencies between corresponding abscissas on the two scales 
are not affected by the transformation, ie., ydr=y'dz', we have 


dz} y) 
=> li 
os ¢(z) 
1 aF(z) 3F2=} 
l i cases: ceca ee 3 
a) od 


The most familiar transformation of this sort is that in which it is 
assumed that the elementary factors have constant percentage effects. 
1 Compare Arne Fisher, The Mathematical Theory of Probabilities, 1923, p. 236. 














164 American Statistical Association [46 





In this case g(x)=2, F(x)=log zx, and the formula of the observed 
curves of the kind which transform into normal curves is simply 


1 
es SIV Qe 


The transformation is even simpler if the distribution is represented 
in the ogive form, using the running sum of the frequencies as ordi- 
nates, since frequencies are not affected by the transformation. 

Let p be the portion of the area of the curve between the median 
and the required abscissa, 2, 


[*s <> 


1 
-3 


é 


l—t $s 


For the normal curve! p= ¢ ——dz= po( 2=*) 
o V 29 Cg 


The relation may be expressed in the form 





(2) prf-'p = == 





o 


which is linear with respect to the unknown constants. The abscissas 
plotted against the inverse probability functions of the integrated 
frequencies (from the median) should thus give a straight line in data 
which can be fitted by the normal curve. A distribution which be- 
comes normal on transformation of the scale may be represented by 
the formula 

(3) prf-ip = “2)—8 


The constants a and s are the same as those in equation (1) and are 
the mean and standard deviation of the distribution on the transformed 
scale. The probable success of a given transformation, F(x), may 
easily be judged by plotting it against prf-!p and observing the ap- 
proach to linearity. The constants a and s can be found by fitting a 
straight line, provided that a satisfactory system of weighting the 
observations can be found. 

The purpose of the present paper is to discuss a type of curve which 


1 It is necessary for the purposes of the present paper to use a symbol for the probability integral. The 


2 xz 
established symbol, erf z= go . e-2*dz applies to a different form of the probability integral from 
rT 


that which is given in the tables most frequently used by biologists and is otherwise inconvenient in the 
present connection. For convenience in the actual fitting of data by use of these tables the form 


z 
mt ena ff ota 
2rv¥ 0° 


is used provisionally throughout this paper. In order to change the formulae into standard mathemati- 
cal form, it is merely necessary to make the transformations 4 


prf z= 4 orf — and prftz = V2 erf2z 

















Se CAD NS BC ah ye 











46 


od 
ly 


TP awn 


“Ss 


‘ “| = Cp 


lUchOwr 











47] Frequency Curve and Variation in Percentage Occurrence 165 


has been found to apply well in certain cases of variation in the per- 
centage of occurrence of an event. 

The peculiarities of a percentage scale may be seen from the following 
hypothetical example. Suppose that under a certain set of conditions 
one lot of eggs shows a 50 per cent hatch and another lot from a differ- 
ent source shows a 70 per cent hatch. Suppose that under a second 
set of conditions, the first lot shows an 85 per cent hatch. There is 
not room on the percentage scale for the second lot to show the 20 
per cent superiority exhibited under the first set of conditions. The 
question is, what degree of superiority is to be considered comparable. 

The conditions which determine the hatchability of the individual 
eggs in a given experiment are variable. The state of certain eggs is 
so favorable that conditions may be very bad before the power of 
hatching is lost. Other eggs are such that a slight change in conditions 
shifts the balance between hatching and death in the shell. In still 
other cases, conditions must needs be very favorable before hatching 
may occur. In other words, back of the alternative categories hatching 
and failure to hatch, there is a graded array of internal states. 


V/ 
Z 


Fieure I. Two OveRLappinc NORMAL PROBABILITY CURVES 


B 











As a first approximation, it should often be sufficient to assume 
that these internal states are distributed according to the normal curve 
on a scale on which the same elementary factor has a uniform effect. 
Let A in Figure I represent the distribution of the states of the eggs in 
the first lot. All below a certain critical level fail to hatch. The 
percentage hatch is measured by the area of the frequency curve above 
this point. The second lot is represented similarly by curve B. It is 
assumed that these eggs are affected in the main by the same factors 
and show the same amount of variation but that they have a different 
mean because of particular factors in which they differ from lot A. 
It is obvious that a given change in conditions represented by shifting 
both curves in one direction by the same amount, will have, in general, 
a different effect on the percentage of hatch in the two lots. The 
essential difference between the two means can best be measured by 
comparing the deviations from the means of unit normal curves, of 















166 American Statistical Association [48 


ordinates which cut off the observed percentage frequencies. In other 
words, the natural scale is obtained by using the inverse probability 
function of the excess of the percentage over 50 per cent (prf-'z), in 
place of this excess (2) itself. 

These considerations suggest that a frequency distribution in which 
the variates are percentages, necessarily restricted to the range 0 to 
100 per cent may often be transformed into a curve of unlimited range, 
approaching the normal curve in form by replacing the observed scale 
by the scale, z'=prf-'z. This transformation leads by equation (1) 


to the equation fo , 
N “1n\2 prj *x—a\’ 
(4) y=—e ws x) (oes ) 


The significance of the function prf-! may perhaps be recognized 
most readily by citation of the familiar example, prf—!.25=.6745 
which expresses the fact that the quartile deviation of a normal proba- 
bility curve is .6745 times its standard deviation. The function 
pr{-'(.56—2) is given directly in Table I of Pearson’s Tables for 
Statisticians and Biometricians, and prf-'x may of course be obtained 
inversely from tables of the normal probability integral such as Table 
IV in Davenport’s Statistical Methods. 

The curve may also be expressed in the form 


prf-tp= PEF 


s 





tol 





by derivation from (4) or from equation (3). There is thus a linear 
relation between the inverse probability function of the running sum 
of the percentage frequencies (minus 50 per cent) and that of the 
corresponding abscissas (as deviation from 50 per cent). It is merely 
necessary to plot these two quantities against each other to judge the 
likelihood that the transformation will be a success. Indeed, the two 
constants a and s can readily be estimated from such a graph. The 
intercept of the line on the r—axis gives a, and the slope relative to 
the p—axis gives s. More accurate methods of determination will be 
discussed later. : 

The various forms assumed by the curve with different values of the 
constants may be investigated from the slope equation noting that 


£ prf-te= V/2re $[prf—a]? 
(ortaye—4( PEE ey 
é 


s 





dy V2xN - ~Pi es] 


The shapes of curves for different values of s and a are given in Figure 
II. The simplest special case arises when s=1. If in addition a=0, 













































RG het ABBE Va ek 






































2-0.0 
a@-0.5 











2-10 O 
0 20 40 60 80 1000 20 40 60 80 100 





Fievre Il. Tae Forms TAKEN BY THE CURVE 


1 
y= tal ores (>| FoR DIFFERENT VALUES OF 8 AND G 
8 


8 











168 American Statistical Association [50 


the curve is reduced to the straight line, y= N between the ranges 
r= +.50. 

Assuming a to be positive, inspection of the slope equation (for s= 1) 
shows that the slope approaches + «as x approaches +.50 and also 
approaches + o as x approaches —.50. This is a reversed curve, 
rising abruptly from 0 at the lower limit, flattening out more or less 
and rising indefinitely at the upper limit. 

Another special case arises when s*=3. The shapes of the curves 
for three values of a are given in Figure II, s=V/50. In the symmet- 
rical case (a=0) the slope approaches — © as x approaches +.50 and 
approaches +o as x approaches —.50. This is a rather flat topped 
curve, rising abruptly from 0 at both limits. In case a is positive, the 
slope approaches — o as x approaches +.50 as above, but approaches 
0 as x approaches — .50. 

For values of s? between 3 and 1, both symmetrical and asymmetrical 
cases rise perpendicularly from 0 at both limits of the range, although 
this may not be obvious from the graph if there is marked skewness. 
(Figure II, s=V.75.) 

For values of s? below 3 the slope is always 0 at the limits of the 
range and the form approaches that of the normal curve in the sym- 
metrical case. (Figure II, s=.25, s=.50.) 

For values of s? above 1, the curve becomes U-shaped, the ordinate 
becoming indefinitely great at each end of the range. (Figure II, 
s=V 1.25.) 

A conspectus of the slopes at the two extremes, 0 per cent and 100 
per cent (i. e. at x= —.50 and z= +.50), is given below. 


























a Negative a=0 a Positive 
0 100 0 100 0 100 
per cent per cent per cent per cent per cent per cent 

tt wWerekeeewemeeawn 0 0 0 0 0 0 
Sr +2 0 + —o 0 — 0 
DE kd nkdendencnewenan +o —o +2 — 2 +a —o 
RE eee are —o —o 0 0 + o +c 
| EES ES ee er — 2 + 2 — 2 +a — oo +o 














It may appear surprising to find such a wide variety of curves all 
from the same simple transformation of a normal curve. The U- 
shaped curves indicate that caution is necessary in interpreting a two- 
peaked distribution as essentially heterogeneous where the peaks coin- 
cide with the limits of possible variation. The forms which the curve 
may take agree rather closely with the various sub-divisions of Pear- 
son’s type I curves. 








i 
: 
7 


RE FAS A ate OHS att IL ett ea ED tl lid A aes es ie detobie WBA ot El et 


— 


ee ae eee 











CO tt et ™ 


oF 











a tae Whee ee eee 





<p meer g erg 





51] Frequency Curve and Variation in Percentage Occurrence 169 


The formula does not lead to workable expressions for the mean and 
higher moments. The median and mode as well as the mid range are, 
however, easily dealt with. The abscissa (x), ordinate (y) and the tail 
frequency (}—p) cut off by each can be obtained from prfz, y and 
prf—p, respectively. 








prf x y prfp 
Mid range....... 0 N/se # -< 
Median......... a N/se2 0 
a = . as 
Pere Toe = N/se?0-") —; 


METHODS OF FITTING 


The most direct method of fitting is simply to fit a normal curve 
to the data by the method of moments after transforming the scale by 
substituting prf—'x for x (taken as deviation from the mid range.) 
The ordinates are of course affected by the transformation but not the 
frequencies which can be used with the transformed abscissas to 
calculate the mean (a) and the standard deviation (s) of the transformed 
curve. The values of y and p in the observed curve can then be found 
by equations (4) and (5), respectively. This is the method followed 
in a previous paper! in calculating the constants for the distribution 
of percentages of white in different families of guinea pigs. 

While very easy to carry out, this method suffers from the difficulty 
that the abscissas to the mean of each frequency class are required and 
must be established from the abscissas of the class limits. Moreover, 
the variation in class ranges, after transformation of the scale, makes 
Sheppard’s corrections inapplicable. The difficulty is especially serious 
near the limits +.50, which are transformed into — © and + © 


respectively. 
The best method of fitting is usually that based on the formula 
—l — 
pof-tp- EL 2-8, since p and z are given directly by the data. Any 


method of fitting a straight line gives a means of calculating a and s. 
The most serious difficulty with this method is in the weighting 
which should be given different values of prf-z and prf-p. The 
relation between these quantities may be very irregular near the ex- 
tremes without appreciably affecting the goodness of fit of the curve 
determined from the linear relation over the greater portion of the 


18. Wright, “The Relative Importance of Heredity and Environment in Determining the Piebald 
Pattern of Guinea Pigs,” Proceedings of the National Academy of Sciences, 6:320-332, June, 1920. 








ssB]o Jo rut soddn ; og¢— (g) jo uns ZuluuNni , 


9Z 
pezs'0- = s— x (9()z~ 
Axa (9)z 
O10 = AyIqeqorg ST _ OWE 
9Z (NZ 
e8°SI= (FI) T=X 698'0 = igyg =e 668L'0- = @z”* 





Q0*O0T_ S0°9ST- 13" ete QO" OOT 9801 





0096°T 
sé6er"t 
LOst*T 
9b26°0 
wSSL°O 
8L69°0 
8eSP°O 
ssTte°o 
T68T°O 
4290°0 
4290°O- 


. 


90° 
2t° ¥6°6D 622°¢ 
L2° 28°69 906°2 
¢9° SS*6b 219°2 
v6" 20°69 see"? 
es°Tt 80° 8b 140°2 
6h°2 OS*9F 2te°t 
bL°e 10° bb 9ss*t 
Le°s LL°0¥ L62°T Té68T*O- 
se"L 06° te 220°T 93Te2*O~ 
94°6 29° L2 9$4°0 40°OT- 8eSb°O- 
£2°2t 9L°LT 19°O 0L°6T~ 8L63°0O- 
TS°oT es°s Ec<t"o LS°S2- ¥SSL°O~ 
Te°St 86°8 - L22°O- 60°LE- 9b£6°O~ 
£e°ot 6L°b2- 899°O- 20° 62- Ost*t- 
L£2°6 29°6f- O92°I- 68°LI- S62b° I~ 
To*t 66°83 = bAL*e- «OLS 0096° T= 


. 
wMHnwwnwo 
. . 


Aoocoocsco 


: 
NONBO ANE VEO 
HANN MMSE SK 


» 
2 
6 
6 
8 
L 
T 
z. 


7 
Or NAN NNN 
. 


vv 
' 


eee 


. 


ue rata eaae 
7eTTT Te 8 


wOwHN NM MONO NM HOH NH WOM MH HWW 





T 
2)%*( 21) (11) (o1)s-4 ®-(3)_ (8)*(9) (4);-gad 
4903 % (°°N) (3) (¢) (4) xa = = 
(eT) (sv) (71) (It) (or) __— (6) (3) (2) _(9) (s) (2) 


YOTOS NIMS OLLVINW NO VLVC S.LHOdNGUAVG OL GTAUNO ONILLIM AO SiIvLad 
I WIAVL 


























53] Frequency Curve and Variation in Percentage Occurrence 171 


frequencies. More than 95 per cent of the total frequency of the 
curve is included between the limits prf-'p= +2. It is obvious thus 
that the deviation of a point near the median should be given greater 
weight than one which cuts off a small frequency at one tail of the 
curve. A method found empirically to give good results is to weight 
each observation by the smaller of the two frequencies above or below 
it. The weight is w=.5—p, taking the sign of p as always positive. 
The linear regression of prf—'p on prf—'x may be calculated by the 
usual product-moment formula. Let prf—'z=z2' and prf“p=p'. 











—,_ 2wz' —_  Zwp' 
z= y= 
Dw Dw 
Seana? > 11 
2 wx _ a Zwr'p' —,, 1 
of = -——F ey 
zw § zu ozl* 
a=z'—sp' 


The details of this method of fitting are shown in Table I using 
Davenport’s data! on the percentage of black in the skin color of 
mulattoes, which Pearson uses as an illustration of fitting his type I 
curve in Tables for Statisticians and Biometricians, pp. LX V-LXX. 

The classes and their frequencies are given in columns 1 and 2. The 
frequencies are reduced to a percentage basis (f) in column 3 and their 
running sum minus 50 per cent (p) is given in column 4. Column 5 is 
obtained by looking up the inverse probability functions of the entries 
in4. The weights in column 6 are obtained by subtracting p in column 
4 from 50 per cent, neglecting its sign. Column 7 gives the upper 
limit of the abscissas of the classes as deviations from 50 per cent. 
Column 8 is the inverse probability function of these entries and column 
9 the same multiplied by the weights. The quantities Z', oz”, p', s and 
a can now be calculated from columns 5, 6, 8 and 9. The fitted 
values of prf—'p are calculated from the formula prf-'p= 1/s (prf'x—a) 
and put in column 10. Column 11 is obtained by looking up 
the direct probability functions of the entries in column 10. The 
fitted percentage frequencies (column 12) are the differences between 
successive entries in column 11. The fitted frequencies (column 13) 
can finally be calculated by multiplying the entries in column 12 by 
the total number of cases and dividing by 100. Goodness of fit can be 
found by Pearson’s x? method from the observed and fitted frequencies 
(column 14). 


1C. B. Davenport, Heredity of Skin Color in Negro—White Crosses, Carnegie Institution of Washing- 
ton, 1913. 































172 American Statistical Association [54 


It is desirable to form a judgment of the possibility of a satisfactory 
fit by this method, by plotting the inverse probability function of p 
against that of x as soon as these have been tabulated. Figure IV 
shows a comparison of this graph with the straight line fitted to it by 
the method described. The fit is not perfect but is sufficiently good 
between the limits p'= +2 to suggest that the method should at least 
give a first order approximation to the frequencies. The x’ test! 
























































indicates a probability of about .10 that a random sample from the 
200 
4 
——— 
/50 
% r a= -082 
-Y S= O.49 | 
0 | 
: \ 
5/00 \ 
° 
\ 4 
KR 
50 
29 20 40 60 80 /00 


Percenta ge Of Black in Skin Cofor 


Fieure III. Tse Frequencies or DirFERENT PERCENTAGES OF BLACK IN THE SKIN COLOR OF 
Moutatrogs (DAveNpPoRT’s Data) as FITTED BY THE CURVE 


1086 } (nf tayp—( PES 
0489 489 


-489 


1 The test is applied in the drastic form urged by R. A. Fisher, the number of classes being reduced by 
two to allow for the reduction in degrees of freedom due to fitting two constants. R.A. Fisher, “The 
Goodness of Fit of Regression Formulae and the Distribution of Regression Coefficients,”’ Journal of the 
Royal Statistical Society, 85:597-612, 1922. 





ob Dees 


ee ee Se Th 
on ah rree + at 


SRS Sates Ste Yi sice. eae 


ETT 
whe Pee. 


Kies ioe Wi, an Ae 


| 


ed 





A tee, ry 
ie Bee eee 


ee 


RE Me a Ss Matte? 






















N 
@S 


























Fieure IV. Tue Vatves or prf~'p PLotrep aGAINsT prf~'!z From Davenport's Data on SKIN 
Co.Lor AND Frittrep wits A Srraicut LINE 











FITTED FREQUENCIES (c) 





TABLE II 


THE FREQUENCIES OF DIFFERENT PERCENTAGES OF WHITE IN THE COAT 
PATTERNS OF 10 INBRED GROUPS OF GUINEA PIGS (0), TOGETHER WITH THE 



































































































Fer cent |¢ Family 2c IF ‘ly amily 1 m Family 
white ° c ° c ° c ° ° c 
O- a 2.7 0.1 1 0.0 
2.5- 1 8.9 1 0.8 1 0.4 
7.5- 9 12.1 2 1.7 0.9 
12.5- 10 14.6 2 2.9 1 1.5 
17.5= 18 17.2 6 4.5 0.2 3 2.5 
22.5- 22 19.1 7 6.2 0.3 0.1 3.3 
27.5< 26 21.4 + 8.1 1 0.5 0.2 2 4.5 
32.5- 22] 23.7 12 10.5 2 0.8 1 0.3 + 5.8 
37.5= 29] 26.29 18] 13.2 1.3 2 0.5 11 7.6 
42.5- 40] 28.5 13 16.2 2 1.9 0.9 6 9.2 
47,5< $1 21.1 19 19.8 1 2.9 1 1.4 9 11.4 
52.5- 24 v3.8 22 24.0 4) 4.3 1 2.3 17 13.9 
57,5- 41 360.7 26] 28.8 7 6.3 1 3.5 28 16.8 
62.5- 41 40.29 41 34.6 6 9.2 3 5.7 23 20.4 
67.5- 44} 44.2] 39 41.4 11 13.6 8 8.9] 29 24.5 
72.5= 60] 48.4] 58} 49.77 18] 20.2 17 14.3] 22 29.6 
77.5< 41] 53.99 46} 59.9] 34} 31.07 28] 23.7] 33] 35.9 
82.5- 58 60.8 79 73.2 65 49.2 45 41.07 28 44.0 
87.5- 82 70.6 7 88 91.4 98 84.7 85 77.8 66 55.2 
92.5= 84 87.5 #129 } 119.91 137] 175.87 165] 182.8 83 72.1 
97.5= 5f 63.5 73 78.1] 250) 233.8] 285 38 45.7 
No. 7451 745.0 # 685 | 685.0 | 636] 636.07 642 405] 405.0 
r cent] > Family o Family @ Family 05 joFamily Family oo 
white ° Cc ° Cc ° Cc 0 ° Cc 
0 - 3 0.0 27 8 11.4 
2.5< 3 0.9 2 0.38 51 19} 31.3 
7.5- 2 3.2 2 1.24 90 37] 31.9 
12.5- | @ 7.3 1 3.098 52 42} 30.1 
17.5- 0.3 7 12.5 7 5.98 39 30] 27.7 
22.5< 0.4 23 18.9 12 9.89 23 31 26.2 
27.5< 1 0.8 22 26.1 15 14.79 21 21 22.6 
32.5< 5 1.3 25 33.8]] 22 20.7 8 18 20.2 
37.5< 1 2.2 46/ 41.7 17 27.2 10 13.9 16 17.7 
42.5- 3 3.4 48 49.3] 34] 34.8 6 11.2 13 15.5 
47.5< 8 5.1 52 56.8] 36 42.7 12 8.8 9 13.4 
52.5- 10 7.54% 61 62.48 57] 50.8 1 6.9 10] 11.4 
57.5= 6 10.9 73 66.8 62 58.4 7 5.3 6 9.5 
62.5= 16] 15.67 79 69.89 66 65.5 5 4.0 13 7.9 
67.5- 20} 22.1 72 69.8 77 70.7 8 3.0 8 6.3 
72.5- 31} 51.2 80} 66.9 76| 73.8 1 2.0 3 4.8 
77.5- 32 43.5 53 60.7 72 73.1 3 1.3 3 3.6 
82.5- 49 60.9 50 50.2 64 67.2 0.8 5 2.4 
87.5- 126] 85.2) 27] 35.7% 55] 54.2 2 0.4 2 1.4 
92.5= 113} 118.0] 10 17.1 25] 31.2 1 0.2 1 0.6 
97.5- 46] 58.6 6 1.6 7 3: 0.1 

sro 751 












Suet 
CN Gore ers Pee 


OE Med 
ee ow 


at 





ones 


G need ne * 
Ch, st tte, Al i te acceeteNG nd Let Bey 


Rance tes oda ee 








Rite * 


PE ou aS 








EM VE tetas REL Perna. 


fs 
fe 
ih 
iz 
fs 
ee 

B 











NNN S Gi silat a tela TEP TE tee De Mey 


AOE PE ERs 





57] Frequency Curve and Variation in Percentage Occurrence 175 


theoretical population would give a worse fit. This is not a very good 
fit but it must be remembered that only two constants are used. It 
does not differ to an important extent from that obtained by Pearson’s 
four-constant curve, except in giving no frequencies below 0 per cent. 
The actual frequencies and the fitted curve are shown in Figure III. 

The principal object in using a curve of this kind is not, however, to 
obtain the best possible fit to an isolated distribution, but rather to 
bring a family of distributions under a common viewpoint. This use 
may be illustrated by some data on the percentage of white in the coat 
patterns of ten groups of guinea pigs consisting of the males and females 
of five different closely inbred families. (Table II). Each of these 
families is descended from a single mating after a considerable number 
of generations of brother-sister mating and is thus nearly free from 
genetic variation. The observed variation is due in the main to 
developmental irregularities and should not differ greatly in the ten 
groups. 

Superficially, the ten frequency distributions differ greatly, however, 
(See Figures VI and VII). Some are I-shaped curves positively skew, 
others are I-shaped but negatively skew, others again show a mode at 
one extreme and at least approach closely to being J-shaped. It 
would doubtless be possible to obtain reasonably good fits by using 
Pearson or Gram-Charlier curves. The means, standard deviations, 
and the measures of skewness and kurtosis would, however, differ 
greatly in the ten cases and would bring out no family resemblance. 

As the scale is a percentage one and as there are no cases in the 
present families in which white was completely lacking and but few 
black eyed whites in spite of the piling up near the extremes in many 
cases, the percentage curves of this article seemed worth trying. The 
theoretical basis is the hypothesis that back of the alternative cate- 
gories—color and white—as applied to a particular spot in the skin, 
there is a graded series of determining conditions. To put the matter 
to a rough test, the quantities prf—'p and prf—'x were looked up and 
plotted against each other. The results for the ten curves are shown in 
Figure V. On the whole, it appears that a series of straight lines should 
give a series of first order approximations. Indeed, it appears that 
such lines would differ relatively little in slope (on which the constant 
s depends) but in the main on the one constant a. In other words, the 
transformation should reduce all ten distributions to normal curves 
with standard deviations of the same order but differing greatly in their 
means. The great differences in the superficial appearances of the 


1S. Wright, loc. cit.; and ‘‘The Effects of Inbreeding and Crossbreeding on Guinea Pigs,”’ Bulletins 
No. 1090 and 1121, United States Department of Agriculture, 1922. 














xa], HL NI Gase#nosIq, 444.7, 

SHL 40 SHANOS AG GALI SV 8DIG 
VANIND) AO ATINVY V 40 SATVNGY ONV 
SHV AHL 40 SNUALLVY LVOD AHL NI 
ALIHA\ dO SADVINGOURY INGUIAAIGG 
40 STIONGNOGUY GAY “JA Puno 


IA Gunold 


LxX@ ], S@H4 NI agesnNosICl 4A], HL AO 
S4AUND Ad GALLI SV SPIG VANIND 40 
SaV1INV,] GHUAN] HOOT 40 SAIVNA GT 
ANY S@'1VJY 40 SNUGLLVY LVOD AHL NI 
GLIHA, JO SADVINDOUAG INTUIIdIG 
40 SHIONTNDAUY INT, “JIA Fano 


IIA Gund 


SDIqd V&@NIOf 40 SArlIINV A aguany 

SAI 40 SHIVNGY AGNV SHIVIY AHL 

40 SNUALLVG WOIOD LVOD AHL NI 

SLBA 40 GDVINGOUAT NO Vivcy 

WOUd Z—fud ISNIVOV GILLOIg 

d,_fid 40 SU01IVA GH], “A GUQDIY 
A qaunom 


09 oO» 0 


W484 RS P09 Ul apy ~0 2bP1UAILIIS 
o () ° a oe Of OF OF O 


] 








20S #£0°9 

99 0-@ L0r@ 
CO AVN S 68 AVIV? 
SITWNIS SI7TVW 





0L0+¢ PLoS 
7e/ +? 447°8 

of AMT of AVIS 

SITWWIS SITY 











bass 


bars 





£90¢S es o-s 

oP 0+”? “0-8 
SEATINGS SCATINVS 
SITWNFIS s71~w 





5 
vy 
aodad ut Sai2uarm 





8 
J w sarouan. 





Beye 




















g 
PUA 40 


290+S OL 0°5 
S7/ +0 104+ 


SEATING S SCAT? 
STITVWIT sSI7VN 


36 


{PJ0L go § 





[940 JO $0 


NS 
8 
N 





° 








og0o+s 

660-9 
SAVANYS 
SIWHWIS 














S'GR AS HITTED BY CURVES OF THE 
Tyrer Discussep in rue Text 


or THE Type Discussed IN THE Text 


a FS ee VF LS UINEA FIGS 








59] Frequency Curve and Variation in Percentage Occurrence 177 


distributions are thus in the main reduced to differences in one element, 
the mean tendency toward white. 

The distributions as actually fitted are shown in Figures VI and VII. 
It will be seen that a reasonably close first approximation has been 
obtained in most cases. In two cases—the males of Family 39 and 
the females of Family 32—the x’ test, however, does not indicate a 
satisfactory result from the standpoint of random sampling. The 
other eight cases, with probabilities ranging from .01 to .63 might more 
or less readily result from such sampling. In any case, the method of 
estimating the percentage of white from drawings made in a rubber 
stamp outline is hardly accurate enough to justify the effort to obtain 
a closer fit by more elaborate methods. 

The frequency constants of the ten distributions and their proba- 
bilities from the x? test, are given in the following table. In calculating 





























- | Sex | a 8 Probability 
i hehe kv ak aca ptnle | ¢ | gee | ge | 2 
CE cs ath cennakenkens as chu ken s | ¢ | : a gi | 7 7 
Family 32......... asbingusadesta i$] ie] 2 | So, 
I i iscscssssscenkssniteackapene iti 2 575 21 

BMH acsnsinesiascsvsecsacsestese. | ¢ | 1.007 | 749 | .000,000,00 














the latter, end classes were combined so as to leave no frequencies 
under 2. The most important differences between the distributions are 
brought out by the values of a, the means of the transformed normal 
curves. Among other things this comparison shows the characteristic 
difference between males and females, averaging 0.26, on the scale 
used. The differences in the values of s, the standard deviation on the 
transformed normal curve, suggest questions for further investigation. 
Family 35 with the smallest s is probably the most homogeneous of 
the families genetically, being descended from a single mating in the 
twelfth generation of brother-sister mating. Family 2 with the largest 
8 is probably the least homogeneous—merely being descended from a 
mating in the sixth generation.’ 


1 Since the above was written, pedigree analyses of coat pattern have been made in Families 35 and 2. 
No important differences have been found among the lines of Family 35 which trace to the single mating 
referred to above. Family 2, on the other hand, divides into two approximately equal major branches, 
separate since the seventh generation, which differ greatly in average amount of white. The averages 
for males and females, respectively, were 53.2 per cent and 69.2 per cent in one branch and 80.2 per cent 
and 85.5 per cent in the other. It is evident that heterogeneity has arisen, either by segregation or 
mutation, sufficient to account for the high value of s. 







































178 American Statistical Association [60 


A test of the method of weighting the observations can be obtained 
by tabulating the contributions to x? from classes with different median 
values of p. Any serious error in the method should be revealed by a 
lack of uniformity of such contributions, since the object of weighting 
is to obtain uniformity in goodness of fit for all parts of the frequency 
distribution. The data from the eight guinea pig families which 
showed reasonably good fit and from Davenport’s data on mulatto skin 
color are used. As the number of classes do not differ greatly, the con- 
tributions of all are combined to obtain the average contribution for 
values of p at intervals of 5 per cent. 

















Average 
p Number Contribution 

to x? 

| RENO Pe Nera mere 9 2.30 
BRS <5: ia. a sank Sip ea mlniarie wee 7 1.29 
Rat ca oe kd ake wk ated a ennact 9 0.68 
Sa ee 8 0.92 
RE Rae eee 8 0.92 
NE aa dak rk dk alae where Ae 1l 1.21 
aii cs aes th tava ae vie nos OA 14 1.78 
ay ne 13 0.65 
Canal aca eng akg we ee ee oe ed 18 1.26 
Ro ic: Ered Cavern aha taingiaidglo neler 56 1.06 








For 41 classes, the medians of which fall between the upper and the 
lower quartiles, the average contribution to x? is 1.23. For 56 classes 
the medians of which are beyond the quartiles but not beyond=+45 
per cent from the median of the whole distribution, the average con- 
tribution is practically identical, 1.24. Finally, for 56 classes whose 
medians cut off less than 5 per cent of the total frequency, the average 
contribution is slightly less, 1.06. Apparently the method of weighting 
gives a very slightly better fit near the extremes than elsewhere, but 
on the whole, the closeness of fit seems sufficiently uniform. 











a @® 


>) | 


cor VS GO OD — 














61) Spurious Correlation and Its Significance to Physiology 179 


SPURIOUS CORRELATION AND ITS SIGNIFICANCE TO 
PHYSIOLOGY 


By E. J. GumsBen, University of Heidelberg, Germany 


(1) Functional and Spurious Correlation 

(2) The Variation of a Product and of a Sum of Variables 

(3) Spurious Correlation between a Variable and a Product of 
Variables 

(4) Spurious Correlation between a Variable and a Sum of Variables 

(5) Application to a Problem of Physiology 

(6) Application to a Problem of Skull Measurement 


(1) The coefficient of correlation r gives a measure of connection 
between two variables in the sense of the theory of probability. 

The two variables may be called X and Y, their deviations from 
the arithmetic means, x and y. Then the coefficient of correlation is 
defined as: 

pry 


rx => 
* V3a2. Sy? 


where the sums have to be taken for all observed values of x and y. 

If great or if small values of one variable are always connected with 
great values of the other one, the limiting values of the coefficient 
are +1 or —1. These cases will be called perfect positive and perfect 
negative correlation. In the case of no correlation the coefficient by 
definition is equal to zero. 

If the correlated variables themselves are functions of other (old) 
variables, the variables will be called new variables and this coefficient 
a functional coefficient. This coefficient, too, is zero in case of no 
correlation. If all the old variables are uncorrelated among them- 
selves, the functional coefficient will also be zero. 

These theorems hold only if none of the old variables appear in 
either of the new variables. If at least one of the original variables 
appears in both the new ones, this coefficient will be called a functional 
coefficient with repetition of variables. In this case, if all the old vari- 
ables are uncorrelated, the functional coefficient will not be zero. 
And this, for the simple reason that, besides the uncorrelated variables, 
there exist two variables of perfect correlation, i. e., identical variables. 
The two new variables may lead even to a perfect correlation, if, 











































180 American Statistical Association 





[62 


to choose an extreme example, the two new variables are identical or 
reciprocal. 

If two such new variables have been observed and the coefficient 
of correlation has been numerically calculated, this coefficient will 
not give a sufficient measure of correlation. We must be careful, 
because a spurious correlation is contained therein. This spurious 
correlation is introduced quite formally by the functional connection 
of the old variables in the new ones. 

Spurious correlation may be defined as that part of a functional 
coefficient of correlation with repetition of variables, which does not 
vanish, if all the old variables are considered as uncorrelated. In a 
functional coefficient without repetition of variables this spurious 
correlation is zero. 

If we wish to calculate the spurious correlation, we must express 
the functional coefficient by the original correlations between the old 
variables which it contains, and by the arithmetic means and the 
standard deviations of these variables. The value which will remain, 
if all these coefficients are considered as zero, is the spurious correla- 
tion. 

Let us, then, proceed to calculate the spurious correlation, which 
results from correlating a variable X; with a product and with a sum 
of n variables, each of which has the exponent ky. The variable X; 
will be called the distinguished variable. The exponents ky will be 
called the weights. The functional coefficient will be expressed by 
p and the spurious correlation by op. 

If the weight of the distinguished variable is zero, or if it is not 
identical with any other variable, we have a functional correlation 
without repetition of variables. If, on the contrary, the weight of 
the special variable is not equal to zero, or the distinguished variable 
is identical with any other one, we have a functional correlation with 
repetition of variables. 

Mathematically speaking, our problem is to express the functional 
coefficient of correlation p(X;, IXv*») and p(X;, >Xv*») in terms of 
the original correlation r,, between the variables X, and X, and in 
terms of the variations or standard deviations of these variables. By 
variation v, we understand the standard deviation yw, divided by the 
arithmetic mean M, of the variable X,. 

(2) First of all we shall express the variations of the new variables 
mX,* and >X,*» in terms of the variations of the old variable X>. 
For this purpose we need the arithmetic mean M, and the standard 
deviation o, of ILX,*» and the arithmetic mean M, and the standard 
deviation o2 of >X,*». 
















63] Spurious Correlation and Its Significance to Physiology 181 


Let us have the n variables X;, X2, X, . . . X,, form out of these a 
function f (Xi, X2, X» . . . Xp), call the values in Taylor’s series 
of this function f, f,, fv, fx and suppose that the mathematical expecta- 
tions of the third and higher powers of the deviations disappear if 
compared with the arithmetic mean. Then the arithmetic mean M 
of this function is, according to Czuber,' given by 

M =f+32fwur? + ZT ferucrTer (1) 


the first expression to be summed up for all variables from 1 to n, 
the second for all different values of x and X. 

Based on the same supposition, the square of the standard deviation 
o of the variable f (X;, X2 . . . Xn) will be 

a? = Phy uy? +22 fa fruxsrrer- 

For our new variable I1X,*» there will be 
ky (ky —] kk 
EAR) fy frye Ss. 


k 
f=Im,; f= —f; w= 

My My MxM) 
Therefore M, will be 

M,=m,"o(1+4 ky (ky — 1) 0,2 + Dk gkyvgrarer) 
and o;? will be o;? = Im,**»(Lk,?v,? + WZkckyverarer)- (3) 
Therefore the variation V of ILX,*» is shown by 
Dhyvy? + Whkckrvevr»rer 
(14+$2ky (ky — 1) 0p? + Tk ckyvevnrer)? 
So the variation of a product is expressed by the variations of the 
single variables. Now the variations of variables of different size 
generally are of the same size, therefore the size of the different vari- 
ables has no influence on the size of the variation of its product. 
In case all weights k, are equal to 1, the variation of a product of 

n variables is given by 


V>(ILX,*v) = (4) 





Lup? + 2ZvVaTer (5) 
(1+ Zvxrar.a)? 

Generally the size of a variation v, is about 10 per cent. If all corre- 
lations are moderate and the number of variables small, the sum in 
the denominator may be neglected if compared to the factor 1. If 
all variations are moderate, the number of variables small, and all 
correlations perfect, there will be approximately 


V(1X,) =v». 


The variation of a product of variables which are completely corre- 
1 Cuber, Statistische Forschungsmethoden, pp. 150, 151. L. W. Seidel, Vienna, 1921. 


Ve(1X,) = 

















182 American Statistical Association [64 


lated, is equal to the sum of the variations of the single variables. The 
variation of the nth power of a variable is therefore equal to the n-fold 
value of the variation of this variable. The variation of a product 
of completely correlated variables increases proportionately with the 
number of variables. 

If, on the contrary, all the correlations are zero, we get 


Vy? ( IX,) = >v,?. 


The square of the variation of a product of uncorrelated variables is 
equal to the sum of the squares of the variations of these variables. 
Therefore the variation of a product of uncorrelated variables, the 
variations of which are of the same size, increases only with the root 
of the number of variables.’ 

If, on the other hand, k= —n, kz=1 and all the other weights equal 





zero, the variation of this quotient is, if we understand by r the 


rn 
412 


correlation between X, and X2 
re Xe ) 7” n?v;? + ve? — 2nv,ver 
x;” (1+n(n+1)v22— never)? 


Therefore if m and the variations are small, we get in first approxima- 
tion 


(6) 








$ 


441 


ro X2 
L (2: 5 Pa MVP +02? — QWnvjv2r. 


If n is positive, the variation of the quotient diminishes with increasing 
correlation. If n is negative it increases. In case of complete corre- 
lation between X, and X_2 we have 


v( Xs )= nosey if r= +]. 





X;" 
Similar to (4) we calculate the variation of a sum. In this case 
f= =m,"; fy=kym; fr =hy(ky—1) m,*; f.x=0. (7) 


Therefore following (1) 
Me =2m,*v+32rky(ky — 1) my* py? 
and following (3) 
o2? -_ Dky?m,*e p,? + Qkakym fem url er. 
1 This explains the great variation of skull-capacity, well known in anthropometry, since this capac- 


ity to be measured adequately, is calculated as a product of skull length, breadth and height. There- 
fore this variation is greater than the variation of each of these rather independent values. 














ae 


bed 


Tas 


ee 








64 


he 
ld 
ct 
ne 











65] Spurious Correlation and Its Significance to Physiology 183 


Therefore we get for the variation of a sum 
Dk,?m,*v,? + 2k ckym eM VrT er 
(lm,*»+3rky (ky — 1)m,*vv,*)? 





V2(2X,*) = (8) 
Contrary to the case of a product, not only the variations but also the 
arithmetic means play a réle. 

In case all weights are equal to 1, we get for the variation of a sum 


Ly? +2 - 
V2(=X>) on val os ZeKeAT r 
(Zm,)? 


The variation of a sum does not depend upon the variations of the 
factors but upon their standard deviations. Now if the variables 
are of different size, the same applies to the standard deviations. 
Therefore, the variation of a sum, contrary to the variation of a 
product, depends on the sizes of the separate variables. If, for in- 
stance, there exists a variable of such a size, that, compared to the 
square of its arithmetic mean the squares of the other arithmetic 
means vanish, the same relation will hold for the standard deviations. 
If this exceptional variable is called X;, we get approximately 


V(zX,) an 





(9) 


i. e., the variation of a sum is equal to the variation of the exceptional 
variable. If all the correlations are perfect, the variation will be 


V(>X,) - Duy = LDvpMy 





ZMy ZM, 


i. e., the variation of a sum of completely correlated variables is equal 
to the weighted mean of the separate variations. The arithmetic 
means of the various variables appear as weights. The variation of 
a sum is consequently again of the size of the variation of one of the 
factors. If the length of any organ and a part of it are highly corre- 
lated, it is not of much importance for the variation whether the 
whole length is measured or only the part. If all the variables are 
identical, the variation of the n-fold value of a variable becomes equal 
to the variation of that variable. If, on the contrary, all the variables 
Duy? 

(Zm,)? 

If, besides, all variations are approximately equal, we get approx- 
imately 


are uncorrelated, the variation will be V?(2X,) = 


v’?>m,? 


V? =X> =—.: 
(=X) m,)? 








184 American Statistical Association [66 


If, furthermore, all arithmetic means are about equal, we get approx- 


imately 


V(nX)2—=. 


Vn 
Therefore the variation of a sum of n uncorrelated variables of the 
same size decreases with the root of that number. 

Therefore, the variation of a sum and the variation of a product show 
totally different qualities. 

(3) Now we will calculate the correlation between a distinguished 
variable X; and a product ILX,*» and a sum 2X,‘». This problem 
has already been solved in a previous article ' for a more general case. 

We take the n variables X;, X, ... X, .. . X, and form two 
functions Z=f(X;, X_ . . . X,_) and Z,;=g(Xi, X2 . . . Xp). 

Let us suppose that the mathematical expectations of the third and 
higher powers of the deviations will vanish compared to their arith- 
metic means. Then the coefficient of correlation between Z and Z, 
will by reason of formulas (3) and (6) be 

Ly Guf vt Dacerrer(frGrat+frgu) 
V (Sf2 uy? + QE ficfruxMAer) (Lgv7 uy? + 229K JAMKUAT KA) 
In this formula f and g with their different indices stand for the values 








p(Z, Z1) _ 


in Taylor’s series of Z and Z,;. In the two cases mentioned above, 
Z is equal to X;._ Therefore g;=1; g,=0, for y¥7z. Therefore 


bf + Luofrriv 
(10) 

V Shu? +2 TfcfroKMATKr 
This formula will give the value of the correlation between the dis- 
tinguished variable X; and the function f(X; X2 . . . X,) if we know 
the nature of the function f and the numerical values of the original 
correlations. In order to calculate the original correlations, we must 
calculate the standard deviations. Therefore the formula saves 
numerical work. On the other hand, it may also serve for checking 
calculations. The errors will be trifling for it is the usual practice to 
disregard the squares and higher powers of the deviations compared 
to the arithmetic means. 

If the new variable Z is homogeneous with regard to all old variables 
and if these are of the same size, i.e., their arithmetic means are about 
equal, then all f, are equal and the correlation will be about 








p(X;, Z) = 


Met Zp v 
V Sus? +22 ecerrer 


1E. J. Gumbel, “Uber die bei Funktionen von Variabeln auftretende Korrelation,” Zeitschrift far 
angewandte Mathematik und Mechanik, Vol. 3. 


p(X, Z) ~ 














67] Spurious Correlation and Its Significance to Physiology 185 


Therefore under these conditions the relation of the new variable Z to 
the old variables X; has no influence whatever. If, on the other hand, 
all standard deviations are of the same size (and this will be possible 
if all variables are of the same size) we will get approximately 


Kt+zfhrw 
Vf y +22f, xf AT Kr 
If the new variable Z is homogeneous with regard to all old variables 


X,, if all arithmetic means are about equal, and if all standard devi- 
ations are of the same size, then the form of Z has no influence what- 


ever and we get approximately 





(Xj, Z) = 


1 + irw 


e(X;, Z) = —————. 
Vn + 22rea 


Lastly, if all correlations between the old variables X, are equal to 
+1 the correlation p(X,, Z) too will be +1. 

It is impossible for all correlations between the old variables to be 
equal to —1 if n>2. This will be seen at once, because a perfect 
negative correlation between the distinguished variable and all other 
variables means a perfect positive correlation of those other correla- 
tions each to another. 

Let us suppose that out of the n(m—1) correlations between n old 
variables all n—1 correlations of the distinguished variable X; to all 
other variables are —1 which is the maximum of possible negative 


ae? other correlations are +1. If 





perfect values, then the ~ 


the new variable Z is again homogeneous with regard to all old vari- 
ables, and if all arithmetic means and standard deviations are of the 
same size, we get as shown before 

1+ ry 
vn+2>ra 


Now in our case there will be =7,,=1—n 


e( Xj, Z) - 





and Bra=—n 14+ 2) 27) —4(n—5n +4). 


2—n a 
Vn? —4n+4 


The sign of the root has to be taken as positive because the root is a 
standard deviation. Again the form of f has no influence whatever. 





Therefore p(X;, Z)= —1. 











186 American Statistical Association [68 


The only condition is that the function is homogeneous with regard 
to all the old variables, and that the means and standard deviations 
are of about the same size. If n=2 we get, by calling the variables 
X; and X¢, if these variables have a complete negative correlation, 


uf 11 — wofe — | 
V wr fet pe? fe? — 2fifourme 


if fur = fous. 


pl Xi, (Xi, X2)] -_ 





This exceptional case will be treated later in a more exhaustive way. 
The spurious correlation contained in this functional coefficient 
with repetition of variables is 


, wifi 
op (X;, Z)= ; (11 
Fe. Ww? 
> we" fy 


If f is homogeneous in all variables X,, and if all arithmetic means and 
standard deviations are of the same size, this will be approximately 


1 
X;,Z) 2=— 12 
op ( ) a ( 


i. e., the spurious correlation decreases proportionally as the root of 
the number of variables increases. This stands to reason. The more 
variables that occur, the less will be the influence of the distinguished 
variable. The value of the spurious correlation cannot be calculated 
from the numerical value of the coefficient which exists between the 
new variables nor from the original observations, but only with the 
aid of formula (11). This marks the value of our theorem. 

In particular we get for the correlation of a distinguished variable 
X; with a product ILX,“» by means of formulas (2) and (10) 


k; ky 
us —f(m) +f(m)Zpy—Tiv 
mM; My 





p(X;, IX,*) = 
kk 

McPAT Kr 
mM.mMr 


9 k,? 
(mE? = +29°(m) 
Expressed by the variations v, we get 
vk, + Lovkyriv 


e(X;, 1X,*») = , 
Dv,7?ky? +2 Dv. k karan 











[68 


egard 
ations 
lables 
on, 


ay. 
icient 





69] Spurious Correlation and Its Significance to Physiology 187 


As the standard deviations do not appear at all in this formula, the 
absolute size of the original variable is of no account for this functional 
correlation. If all original correlations are equal to zero, the remain- 
ing spurious correlation amounts to 


, , ky 
op(X;, ILX,*») I _ 
Vy Ky” 


If all weights are equal to 1, the correlation between a special variable 
U+ LoyT iv 


V B02 +2D0e0rT ed 
If, besides, all variations are of the same size, the approximate result 





X; and a product ILX, is p(X;, 1X,) = 


1S 


1+ Try 
Vnt+2ira 


If all original coefficients are positive, the functional correlation 
between a distinguished variable and a product of variables also is 
positive. If all original correlations are perfect, this functional coeffi- 
cient will be perfect too. 

If the weight of the distinguished variable is zero and all other 
weights equal to 1, the correlation between a variable and a product 
not containing that variable is 


e(X;, IX) & 


LUT sy 


V S02 + QEve0arer 





p(X;, X,) = 


Here the sum has to be taken for all values of » with the exception 
of i. In the special case that ki; = —n, kz =1; ky =0 for v¥1, 2 the result is 


o( Xs, zs) Vel jg — NUN (15) 


rn _ . 
XxX, V nev? +02? — Qnvywer,2 





Consequently the coefficient of correlation depends only on the three 
coefficients of correlation 71,2, 7;, 7; and the two variations v; and t 
but not on the variation v,;. If all correlations between the old vari- 
ables are equal to zero (it being understood that X; is not identical 
with either of the two other variables) this functional coefficient will 
also be equal to zero. Taking n=0 we naturally get p(X;, X2) =72. 


- 


n 
441 


From (15) we get two functional coefficients with repetition of 


For increasing n the value of A Xs xs) tends towards —ry. 














188 American Statistical Association 





variables. If we take X;=X;,, the result is 1;=1 and m;=71,2. If we 


call r1,.2=r we get 





, & Ver — nv ; 
AX An)... : : . (16) 
Xi V n2v2 + 022 — Qnvywor 


The greater n, the nearer will this functional coefficient be to the 
value —1; the smaller n, the nearer will the functional coefficient be 
to the value +1. Of course the reason is, that the correlation between 
X, and X;~ is equal to —1, if n is positive; and equal to +1 if n is neg- 
ative. This influence will dominate the correlation between X; and X; 
with increasing n’s. 

Now the sign of the root, which is a standard deviation, is always 
positive. Therefore the values of the functional coefficient which 
belong to the limiting values r=+1 or r= —1 can be seen from the 
following schedule: 








co ga)er | oxo 3) 
X:1, — }=1 X:, — J=-1 
o( ox. oY xX 
nZ0 
r=1 or n>0O and m%<nv 
n>0O and ve>nv 
n=O 
r=—1 | n<0O and v2<|n|o1 or 
n<0 and ve>|n|o1 








Down the side of the schedule are the two limiting values of r, 
across the top are the limiting values of the functional coeffi- 
cient p, the schedule enclosures contain necessary and sufficient con- 
ditions for the coexistence of those values. It will be seen, that the 
sign of the functional coefficient is by no means identical with the 
sign of the original coefficient which it contains. 

If on the other hand we take X;= X_2 we get 7;=1 and ryj=1r1,2.=". 
Therefore formula (15) becomes 


Xx: Ve— NViP " 
o( Xs, = )= —— : (17) 
Xi, V nev? +02 — Qnvwer 





The greater n, the more this functional coefficient approaches the 
value —r, the smaller n, the more it approaches +r. In both cases 
the value of X,” predominates the factor X2 connected to it. 


(70 








5 
4 
i 
i 
3 
4 
| 
: 
j 
4 





Bite erie EE aa hte Dee Dhlor oe: 


Ser ree eenee oa: 


< adits leat ioe Eh 


Ta ts at AM ON Nil OE MEE St Me 











n 








f 
t 
i 
: 
i 
i 
& 











71] Spurious Correlation and Its Significance to Physiology 189 


The values of the functional coefficient which belong to the limit- 
ing values r= +1 are given by the following schedule, the meaning 
of which is the same as the meaning of the first one. 


. zs) (x x2) 
Xo, atu Xe, a2 )=-1 
o( x." a oe 


n<0 
r=] or n>0O and m<ny, 
n>0O and »>ny, 








n>0O 
p= —] or n<0O and ve <|n| v1 
n<0O and v2 >|n| 01 








It shows that the sign of the functional coefficient is not identical 
with the sign of the original coefficient, which it contains. 

The spurious correlations contained in the functional correlations 
with repetition of variables are 


Xi V nv? + v,? 
and 
= xs) Ve 
xX: — |= ————.. (19) 


The spurious correlation (18) is positive for nS—1 and negative for 
n2+1 and, naturally, equal to zero if n equals zero. Like the func- 
tional correlation it tends towards —1 if n is increasing, and towards 
+1 if n is decreasing. If n is finite, the functional correlation will 
always be less than 1, as it must be. The negative sign conditioned 
by a positive n results again from the same circumstance as we have 
indicated above when we treated functional correlation. The spuri- 
ous correlation (19) is always positive. If n is equal to zero, it is 1, 
representing a correlation between identical variables. When the 
above value of n increases it tends towards 0. 


(4) Formulas (7) and (10) give for the correlation of a distinguished 
variable X; and a sum >X,*» 


kyu 7+ SK ppm, ry 


DK y,?m,**v -s + QD uxerkgky mex mr" 7. 


p(X, rX,*v) = 

















190 American Statistical Association 





Introducing the variations, the result is 


kom + Dkyvymy Porgy 


‘ (20) 
Dky2v,2myP*v + Whkpvgv_my*xmy*rryy 


p(X;, 2X,*») = 





Let all the original coefficients be equal to zero, the remaining spurious 
correlation will be 





sr” 
ve... (21) 
vV> ks?v,2m,2#e 


If all weights are equal to 1, the correlation between a distinguished 
variable and a sum is 





o(X, >X,) pn Met Zp iv , (22) 
V Spy? + 2EpKurrer 


This formula is exact and differs in this regard from formulas (13) to 
(19). In this case, the second and higher differential quotients in the 
Taylors Series disappear. Furthermore, in contrast to the preceding 
cases, it is not the variations but the standard deviations which play 
a rdle. Therefore, here the size of the variable is of importance. If 
for instance all variables are of the same size, all standard deviations 
may be equal, and the formula is leading to 


1+ ry 


p(X;, rXy) _ 
Vn +2z=rer 


= p(X,, 1X). 


Thus we get the same formula for the correlation of a distinguished 
variable and the product of variables as for the correlation of a dis- 
tinguished variable and a sum of variables, provided all variables are 
of the same size, i. e., all arithmetic means and standard deviations 
are equal. This is according to our more general theorem that the 
form of the new variable plays no rdéle in correlation provided the 
function is homogeneous and the variables are of the same size. 

If all original coefficients are positive, the functional coefficient too 
is positive. If all original correlations are perfect and positive, the 
correlation between a distinguished variable and a sum of completely 
correlated variables is also perfect and positive. 

The spurious correlation between a distinguished variable and a 
sum of variables is 








op(X;, DX) = Ta =. (23) 
aMy 








we eS 1s tm 


al A. 





AP DDE Cat IP. ~ 


anes 


PETIT 











Ce i ik lh toe «cle 





Ee et vie 8 Sans nets Mesa ee me 


St esas. 








73] Spurious Correlation and Its Significance to Physiology 191 


Therefore it is always positive. The greater the distinguished vari- 
able compared to the other variables, the greater the spurious corre- 
lation. If the other variables disappear compared with it, the spuri- 
ous correlation approaches perfect correlation. If, however, the 
weight of the distinguished variable is zero and all the other weights 
equal to 1, the correlation between a variable and a sum, in which 
this variable does not exist, will be 


Lee iv 


p(X;, >Xy) = 
VS u?+ 22 eKMAT KD 





where the sum has to be taken for all values of » with the exception of 7. 


(5) This problem of calculating spurious functional coefficients of 
correlation has a bearing on physiology. 

Let L be the length of the human body, and Q a fictitious transverse 
section of the body, defined by the relation that L multiplied by Q 
multiplied by the density of the body be equal to the weight. For 
the sake of simplicity we assume that the density of the body is con- 
stant. Then physiology is interested in the correlation between L 


and the new variable os and the similar correlation between Q and 


= Taking L= X;, and Q= Xz, we get the coefficient A(t, *) from 


(16) and the coefficient Aa, =) from (17), and the spurious cor- 


n 


relation contained in both from (22) and (23). 

If n is equal to —1, we get from (16) the correlation between the 
length of the body and the weight; if n is equal to 0, we get the ordinary 
coefficient of correlation between length and transverse section. If 
n=1 we get the correlation between the length and the quotient from 
the transverse section divided by the nth power of the length. 

As a rule, such functional coefficients are directly calculated from 
observations of length of body and weight with the aid of the usual 
methods of correlation. Kaup! for instance got from the German 


male material investigated by him an r of the size 0.3; At, 2) of 


the size 0; At, 2) of the size —0.5; A(t, “ 


< 


) of the size —0.6. 
According to these values of Kaup, the correlation between length 


1“ Korrelationskoeffizient und funktionelle Abhangigkeit von Kérpermassen,” Sitzungsberichte der 
Gesellschaft far Morphologie und Physiologie, Vol. 34, pp. 39-72, Munich, 1923. 












192 American Statistical Association [74 


and weight p(L, Q- L) is about 0.5. Estimating the weight from the 
length, as did Fabio Frasetto,! is therefore rather uncertain. 
The functional coefficient of correlation will be negative, when 





a>, As vr—v, is according to Kaup of the size zero, this will be 
V1 


the case for n>1. 

In particular we get for n=1,2,3 spurious correlations of —0.3, 
—0.5, —0.7, which are contained in the observed values. 

From (17) we get for n= —1 the correlation between Q and (Q. L), 
which has a special physiological interest. 

Suppose the breast circumference is proportional to the transverse 
section A, this may be taken for the correlation between the weight 
of the body and the breast circumference 





oy A 
Vy? + 022+ Quer 
This formula will also result from (16) if we insert n= —1 and inter- 


change the variables length and transverse section. For n=0 we get 
AQ, 2) = 1 which represents a correlation between identical variables. 
This functional coefficient of correlation will be negative when a>—. 
V4r 
Therefore, according to the indications of size mentioned above, this 
will only be the case for n>9. This functional correlation therefore 
will be positive for all values which are of interest for physiology. 


(6) Two special cases of our problem have a bearing for measure- 
ment of the human skull.2 Here the correlation is formed between 


the length L, breadth B and height H and the indices : - : * cal- 
culated from these values. 
The resulting coefficients A(t, ), (8, #), C. . r) 
H L B 
H L =) 
L, —]}; op B, — }; | H, — 
ts Bi AB gio 7 


are of the form (Xs x): 
X 


1 


1 Academia dei Lincei, 30 Ser. V., Roma, 1922. 
? Karl Pearson and Adelaide G. Davin, ‘“‘On the Biometric Constant of the Human Skull,” Bio- 
metrica, Vol. 16, Nr. 3/4. 1924. 








* 
s 

bs 
i 
ES 
é 
g 


eS 8 EE 


At Aaa 2 


SOPOT IT Sis, TI SIS PRP 











75] Spurious Correlation and Its Significance to Physiology 193 


B H L 

pe eoticients (1, 2); 8, 2); (1, 2) 
the coefficients p L p B p 7 
H L B 

L, — ; B, z), (4, =) 

( 1) a wr. a 


are of the form A(X: %). 


r 


4.1 


and lastly the coefficients 8, 


S| & 


AH, 


are of the form A(X: X: , 
X 


1 


‘Si 


Thus if we have ascertained the three coefficients of correlation and 
the variations of the values L, B, H, these coefficients may be cal- 
culated from (15), (16) and (17), and the spurious correlations con- 
tained in the observed coefficients from (18) and (19). 

If we know 6 coefficients these formulas also admit the calculation 
of the 12 others, the only thing to be done is to put n= 1 and to replace 
the variables used there by those used here. 

Attention must be given to the existence or absence of mutual 
factors in the correlations of length and indices. As Pearson has shown, 
it is only in the first case that there exists a correlation. Length, 
breadth and height have such correlations with the indices in which 
they occur; about none in other cases. 

Therefore, the correlations which do not vanish are in greatest part 
spurious. The proof is quite simple. The relations 


B 
4 
( 2) 
B 
yw. Slee 
o( 2) 


#) 9 
L 


Vel2,3 = V1713 


mean according to (15) that 


Ve11,2 = U37153 
0171,2 = U3T2,3 


As the variations differ from each other the only solution of this 








194 American Statistical Association [76 


system of equations is 71,.=71,3=12,;3=0 which the above statement 
has proved. 

The correlation of a distinguished variable with a sum is of impor- 
tance for the measurement of the human skull. The form in which 
it appears is the correlation between a variable and a second one, 
which contains the first. These formulas are given by (22), the 
spurious correlation contained in it by (23), if we put n=2. 

Thus we have reduced the variations of a product of variables to 
the variations of the separate variables (5), and the variations of a 
sum, to the standard deviations (9). While the variation of a product 
is of the size of the sum of the variations, the variation of a sum is of 
the size of the variation of one variable. Furthermore we have cal- 
culated the correlation of a distinguished variable with a product 
(13) as well as with a sum (22) and the spurious correlations contained 
therein (14) and (23). They too differ in general because in the case 
of the product the correlation is expressed by the variations, in case 
of the sum by the standard deviations. In a special case both are 
identical. 

Examples taken from physiology and craniology have shown the 
use of these formulas. 


List of Mathematical Symbols 


Coefficient of correlation between the variables X and Y. 


Txy 

My Arithmetic mean of the variable X,. 

by Standard deviation of the variable X,. 

Vy Variation of the variable X>. 

p(X;, Z) Functional coefficient of correlation between a distin- 
guished variable X; and a function Z of old variables 
Xi, 4 ¢ oma ie 

op Spurious correlations. 

M Arithmetic mean of a function f (X;, X_. .. . X,). 

fi, fv, fw, fx Values which appear by expanding f(X1, X2... Xn) 
in Taylor’s series. 

p Standard deviation of a function f(X,, X_ . . .Xq). 

V Variation of a function f(X;, X_ .. . X,). 

ky Weight of the variable X,. 

Tk Coefficient of correlation between the variables X, and X,. 

n Number of variables. 

n A natural number. 











66 te ft Soo —_ ies ale 


a 


oOo, @ 0 ec 











NOTES 


CLASSIFICATION OF CAUSES OF DEATHS AND 
DEATH REGISTRATION 


I. PRINCIPLE AND OBJECTS OF THE CLASSIFICATION OF THE CAUSES 
OF DEATH! 


By E. Rorstz, M.D., Berlin 


1. System of Classification 

The first principle of any classification should be systematic ar- 
rangement. For this reason the system of the classification for causes 
of death must include all groups of diseases in order to enable any 
cause of death to be classified in the group to which by characteristics 
it belongs. A collective group labelled ‘other diseases’? must be re- 
garded as illogical in a system of classification. 

The detailed International List of Causes of Death conforms to 
this principle, while such is not the case with the short list. The latter 
includes 36 important groups of diseases, or parts of groups of diseases 
under special headings, which results in the necessity of a collective 
group, “other diseases.”’ A similar collective group exists also in the 
lists of causes of death employed in various countries not using the 
International List—such as Italy and Denmark—although information 
in regard to a large number of diseases is obtained. 

The proportion of the number of deaths from “other diseases” to 
that of deaths from all causes gives a valuable indication of the inade- 
quacy of short and unsystematic lists of causes of death. 

During years of normal mortality conditions for example the use of 
the German short lists results in the grouping of one-eighth of all 
deaths (1913) under the heading “other diseases,’ while with the 
Austrian list, this proportion of “other diseases,’”’ amounts to 50 per 
cent (1913), and with the use of the short International List of causes 
of death in Czechoslovakian cities and towns (1923) this proportion is 
one-sixth. 

It is evident that tabulation of causes of death according to various 
unsystematic lists has a very unfavorable influence on international 
comparability. Data can be compared only between countries where 
causes of death are given according to a systematic list including all 
diseases. 


1 Address delivered before the third collective study of medical statistics at Geneva, August, 1925. 








196 American Statistical Association (78 


It is, therefore, in the interest of the central or local administration, 
as well as of medical and statistical science to establish complete 
systematic lists of causes of death, as the collective group ‘other 
diseases” is of neither practical nor scientific value. It is the purpose 
of statistics to establish facts and to promote scientific knowledge; this 
can be accomplished only by systematic specification of its results. The 
shorter the list of causes the less the value of the statistics for practical 
administrative and scientific purposes. It must be considered that, in 
the absence of general morbidity statistics, mortality statistics form 
the only available measure of the sanitary conditions of a population. 
Incomplete statistics necessarily result in an incomplete picture of 
sanitary conditions. The improvement of these statistics is the duty 
of every country which desires to investigate the sanitary environment 
of its population and not to confine its preventive measures merely to 


infectious diseases. 


2. The Principles of the Arrangement of Groups of Diseases 


In most lists of causes of death the following arrangement is observed: 
Infectious and general diseases are placed first, and violent causes at 
the end, i.e. before the last group (that of insufficiently defined diseases), 
while the intervening space is occupied by organic and other diseases. 
No principle can be invoked to justify this arrangement. A general 
principle exists, on the other hand, for the selection of joint causes of 
death, namely that of selection of the causes which is most important 
to the state for purposes of administration and public health. The 
disease groups might similarly be logically arranged according to their 
administrative importance. Violent deaths should be placed first in 
order to conform to this principle. An absolute separation of violent 
from natural causes of death is equally desirable for biological reasons, 
since biology deals only with natural causes of death, while it is mainly 
the police organization which is interested in deaths from violence. 
Hence we have to divide causes of death into two primary groups: 
(A) violent causes of death; (B) deaths from natural causes. 

In conformity with this principle of the selection of joint causes of 
death, individual groups of death from natural causes should be ar- 
ranged according to their importance, and this may be done by three 
main groups: (a) general diseases, (b) other diseases of special impor- 
tance, (c) organic diseases. 

The first main group (a) is at present universally divided into: (1) 
infectious diseases, (2) general diseases proper. All diseases belonging 
to this group are of practical importance from the standpoint of pro- 
phylaxis. These must therefore be specified by name, and must be 





79] Notes 197 


given preference for tabulation except where combined with violent 
causes. The arrangement of individual diseases in each particular 
group must again, therefore, be made according to their general im- 
portance. This rule, however, is not at present observed. 

The second main group, that of special diseases, (b), should include 
four sub-groups as follows: (1) puerperal diseases, (2) deformities, 
(3) infantile diseases, (4) senile diseases. These are diseases of the VIII 
and X to XII groups of the International List, which are disadvan- 
ageously placed. 

As to diseases in the third main group, organic diseases, (c), it is 
doubtful whether their arrangement should be made according to 
Raymond Pearl’s suggestion, i.e. in the order of development of 
individual organs from the germplasm, or in order of importance. 
This might be decided when the results of Pearl’s investigations are 
more generally confirmed. 

These general points of view should be taken into consideration 
when the International List is revised. It would seem advisable as 
the next revision will not take place before 1930 to collect in one place 
all proposals on this subject, since such a list retains its value only 
if it meets the requirements of practical public health administration 
and at the same time favors the progress of science; otherwise the list 
would become antiquated, as has been the case with many classifica- 
tions formerly employed in different countries. It might be of interest 
to countries still retaining their own systems of classification, to realize 
by an example like the following the unfavorable influence on interna- 
tional comparability of a rigid system of classification. In the German 
nomenclature of 1904, diseases of the thyroid gland were included 
among diseases of the digestive organs, while paratyphoid fever, 
Weil’s disease and others (at that time not considered as infectious) 
were placed in the group “other diseases.’’ This illustrates sufficiently 
the resultant incomparability of the tabulation of digestive and 
infectious diseases. 

These few examples, of which more could easily be given, illustrate 
the necessity for a uniform classification corresponding to the progress 
of science, if mortality statistics are to be internationally comparable. 


3. Remarks on Joint Causes of Death 


One weak point in mortality statistics has still to be considered. 
Lists of causes of death are usually so arranged that only one cause of 
death can be shown. The chaos caused by this arrangement during the 
1918 influenza epidemic has probably not been quite realized. Fortu- 
nately this chaos made it possible to introduce into the International 








198 American Statistical Association [80 


List the separate grouping of all influenzic deaths with pneumonical 
complications. It is equally necessary to specify other diseases termi- 
nating fatally owing to complications such as septicemia, purulent 
meningitis, and peritonitis, which are given as separate causes of 
death and which form a very variable proportion of causes of death 
according to the accuracy of certification in the various localities. 
One may suspect that only the complication is given on the death cer- 
tificate in some cases and not the primary cause of death. This source 
of error could very easily be removed, if, as in the case of influenza, all 
diseases with frequent fatal complications were divided into two groups, 
i. e. without and with complication. This division would not only facili- 
tate certification, but would also induce physicians (who would readily 
perceive the value of the question) to make a more accurate statement 
concerning joint causes of death. It would also have an educational 
value for physicians, who look upon the present system of enumerating 
causes of death as too crude to justify the trouble required to solve the 
problem of related diseases, as demanded in the death certificate. 

I wish to emphasize that these proposals would not impair compara- 
bility with earlier records as no change is involved, but only the transfer 
of causes of death according to their practical importance, with the 
object of making the returns more valuable to the administration as 
well as of establishing rules for selection of combined causes of death. 

This is not to be interpreted as meaning that I do not advocate any 
changes in the individual titles. However, I reserve the right to bring 
up the suggested changes before the next revision of the International 
List. On the basis of my experience gathered by visiting different 
countries, I might call your attention to a deficiency which may be 
disposed of immediatcly. This is the division of deaths from puerperal 
fever, i. e. those occurring post partum, and those known as post abortum. 
The certificate of death post abortum is possible in countries where the 
names of women giving birth to a child and those dying from puerperal 
fever are given; the birth certificate corresponding to the particular 
death certificate could in such cases be easily traced. This is not the 
case in Switzerland where an anonymous death registration system 
exists; however, experience in that country shows that physicians, for 
obvious reasons, do not hesitate to state the real circumstances of the 
case. The number of deaths from puerperal fever post abortum in 
Swiss, as well as in German cities, is greater than the number of deaths 
post partum. As the latter might be related to the number of births, 
one would arrive at misleading conclusions if death from puerperal 
fever were not divided as suggested. This example amply illustrates 
the fact that even the specification of causes of death in the detailed 





8] 


lis 
$0 
su 


Or oO &. 





81] Notes 199 


list is not wholly satisfactory for the purpose of international compari- 
son. As, however, the purpose of the International List is to facilitate 
such comparisons, the object of our endeavor must be to find such 
sources of error and to correct them. 


II. SPECIAL CONSIDERATIONS INVOLVED IN THE CERTIFICATION OF 
REGISTRATION OF DEATHS BY VIOLENCE ! 


By E. Rogstze, M.D., Berlin 


We speak of the natural movement of the population and of the 
natural order of survival, as if all men died a natural death. If, how- 
ever, we examine statistics of causes of death, we find that a large 
number of unnatural deaths occur. In the age classes of middle life 
and for the male sex, death from unnatural causes is the second most 
frequent cause of death, tuberculosis coming first. 

This fact removes any doubt as to whether our life-tables, as at 
present drawn up, are worthy of consideration from the biological 
point of view. I believe that not only medical statistics, but also the 
science of insurance, would profit by reliable information on actual 
mortality from natural causes. 

To this end, a distinct separation is made in our lists, between violent 
and natural causes of death, thus providing data for all those interested 
in such statistics, since the practical interest of the health adminis- 
tration centers in natural causes of death, while police organization is 
concerned only with violent deaths. 

It might be suggested that this separation should be made also in 
the list of causes of death. At first sight, this separation appears to be 
a simple matter, since it is only necessary to group together deaths from 
accident, murder, execution and suicide in order to obtain in this way 
the total of causes of death by violence. 

However, practical experience shows that many difficulties exist. 
The International List takes no account whatever of these difficulties 
since it gives no definition of the term “‘death by violence.” Only as 
regards suicide the list gives the following explanation: “ Under sui- 
cides, only deaths of persons with regard to whom suicides or at- 
tempted suicide is established, are to be included.” 

But it is recognized that not only attempted suicide but also acci- 
dents may result in diseases which influence the course of other illnesses 
even to the extent of overshadowing the real primary cause of death. 
Here again arises the danger of asking simply for the principal cause 
of death, since the doctor, in many cases, states in the certificate only 


1 Address delivered before the third collective study of medical statistics at Geneva, August, 1925. 











200 American Statistical Association [82 


the secondary disease, which he has been treating. Consequently 
many deaths due to unnatural causes are classed among natural deaths. 

The police organization in many countries has long realized that 
statistics established by medical practitioners relating to violent 
causes of death cannot be utilized at all and it has therefore established 
a system of its own. Thus, Germany possesses two separate sets of 
statistics on suicide, one by the sanitary administration and the other 
by the police. I need hardly point that the two results are not in 
agreement. Similarly some German states have separate police 
statistics of causes of deaths. These police statistics are remarkable 
for their detail as compared with the ordinary death-by-accident 
statistics and therefore they possess a real practical value for the 
police administration at least. 

On the other hand such statistics have no practical value for a sani- 
tary administration, except perhaps those accidents in factories, etc., 
which, however, do not terminate fatally and are naturally not re- 
corded in cause of death statistics. But the value of such statistics 
can be only theoretical for a sanitary administration as also for medical 
statistics. For the separation of violent from natural causes the police 
data are insufficient, since they are naturally confined to cases re- 
ported by the police themselves, and possess no biological value. 
Hence the task of medical statistics is to give deaths their biological 
value, that is, to discover whether the cause of the death is of exogenous 
or endogenous origin, ‘‘exogenous” indicating a violent influence. 
The achievement of this task is not difficult in cases where death oc- 
curs immediately or soon after the violent influence; but it becomes pro- 
portionately more difficult the slighter this influence or the later the 
occurrence of the disease, causing death. 

Accuracy in these cases can only be secured by suitably wording the 
question on the death certificate, that is by special questions regarding 
violent causes of death, in order that the medical practitioner may per- 
ceive that such causes are dealt with separately in the statistics. 
Causes of death such as ‘‘pyemia,” “septicemia” and other concep- 
tion of the cause of death, should be removed from the list, as they 
cannot be taken into consideration from the biological point of view. 
In their place must come the primary cause, and if violent, should be 
fully enumerated as such. 

If we succeed in thus obtaining a complete representation of the 
influence of violent causes of death on mortality, then we also obtain a 
real biological picture of mortality conditions by considering separately 
statistics on natural causes of death and natural mortality. The lat- 
ter would be of far greater value to international comparison than 











% 


~~ 


‘= = 














201 





Notes 





83] 
general mortality statistics, and would enable a much better judgment 


to be formed of the hygienic conditions in the different countries than 
by means of general mortality statistics only. 


III. AN AMERICAN POINT OF VIEW 


By Georce H. Van Buren, Supervisor; Statistical Bureau, 
Metropolitan Life Insurance Company 


No one will question Dr. Roesle’s statement that ‘‘The shorter the 
list of causes the less the value of the statistics for practical adminis- 
trative and scientific purposes.” It is, nevertheless, a fact that the 
short or “‘ Abridged International List of Causes of Death”’ serves very 
well the purpose for which it is intended and that many offices which 
publish reports on vital statistics could not get along without it. It 
is also a fact that even those offices which use the long or “detailed”’ 
list, find themselves compelled to employ the shorter one, as well. 
The necessity for this arises from several considerations, the most im- 
portant of which is the fact that altogether too bulky volumes would 
be required if the attempt were made to publish all of the tables ac- 
cording to the long list. I think it is safe to say that in this country 
there is no state and no city whose vital statistics budget would pro- 
vide for the publication of the volumes that would be necessary if the 
detailed list alone were used. And even the United States Census 
Bureau finds it altogether better to use the short list for some of its 
tables. 

Let us suppose, for example, that the vital statistician wishes to 
show the death-rates with distinction of cause of death, color, the total 
for all rural districts and all cities in each state, and totals for each 
separate city and county. He might even desire to go further and show 
these facts with distinction of sex. It is not difficult to see that such a 
table, in itself, would make up a good sized volume—one of over 200 
pages. It appears, therefore, that the abridged list is a necessity; and 
it is a fact that for some countries and for some communities in other 
countries, statistics become available, through its use, which would 
not be published at all if vital statistics offices were absolutely restricted 
to the use of the long list. In other words, the abridged list is a whole 
lot better than none at all. 

I cannot agree with Dr. Roesle that ‘a collective group labelled 
‘other diseases’ must be regarded as illogical in a system of classifica- 
tion.” In this connection, we must remember that every table is a 
part of the public health bookkeeping. A certain number of deaths have 
occurred in a given country, or state, or city. Each and every one of 


















































202 American Statistical Association 





[84 


these deaths constitutes a unit which is a part of the grand total. 
Each and every title of the International List constitutes a “pigeon 
hole” into which some of these deaths are thrown. The majority of 
these pigeon holes (for example, typhoid fever, diphtheria, cancer of 
the breast, appendicitis) relate to definite diseases, and the great 
majority of the deaths can be classified under these definite headings. 
But, in a practical working list which statistical compilers must use, 
it is impossible to provide a separate classification unit for every disease 
and for every form of violence. Consequently, it is necessary to select 
the most important diseases and to assign to each one a definite 
“pigeon hole.”” When the resulting number of “pigeon holes’ has 
become so large that it is impracticable for a compiling office to use it 
if it is further extended, it then becomes necessary to provide for what 
is known as a residual title. The caption of this residual title usually 
begins with the word “other;” and under it, are classified those deaths 
which do not fit into any of the more definite categories. Now it 
would be utterly wrong to classify under the definite title headings, 
deaths which were not caused by the diseases to which those title 
headings relate, and it is, therefore, obviously necessary to have these 
residual titles in order that all the deaths may be included, and that the 
table may “ balance.” 

Even the detailed list follows this principle, the difference between it 
and the abridged being that the residual groups are smaller. For ex- 
ample, let us take Group V which relates to diseases of the respiratory 
system. This group is made up of eleven rubriques, ten of which cover 
definite morbid conditions or groups of very closely related diseases. 
Then we come to the eleventh rubrique, which is called “other diseases 
of the respiratory system” and under which are included hundreds of 
other respiratory conditions. Now it would be utterly impracticable 
for any compiling office to attempt to segregate statistically, the deaths 
reported from each. In consequence, it becomes a necessity to provide 
for this group of “other” diseases; for deaths are actually reported 
from each of them and every such death must be counted. In other 
words, these catch-basket title headings are a necessity, for we must 
have some place for the residuum of reports which we cannot classify 
under any of the really definitive headings which it is practicable to 
include in a practical working list of causes of death. I agree with Dr. 
Roesle that it is unfortunate, in the case of the abridged list, that such 
a considerable fraction of the total is inevitably included in the “‘ catch- 
basket.” 

The questions which Dr. Roesle raises under his discussion of ‘the 
principles of the arrangement of groups of diseases” might well be 

















ee 





85] Notes 203 


brought before the Commission for The Revision of The International 
List, which will meet, presumably, in Paris in 1929. The proposals that 
he makes are so radical that if adopted, they would change the make- 
up of the International List so completely that it would be impossible 
to make comparisons with the mortality statistics of the past. Whether 
they are necessary, advisable or practicable, rests entirely with the 
International Commission to decide. It is my personal belief that Dr. 
Roesle will find the best opinion from practically all of the countries 
lined up against him with respect to these proposals. 

We all admit the weak point in mortality statistics which he stresses, 
namely, joint causes of death. A number of us in this country have 
been struggling with this problem ever since 1900. A committee of 
the Vital Statistics Section of the American Public Health Association 
is now working on this very matter; but this committee fully recognizes 
the fact that it is not practicable to compile and publish, each year, 
statistical tables showing every cause of death involved. The most that 
the Committee hopes for is that the Census Bureau and some of the 
states will undertake, every five years or so, the publication of such a 
table. It is obvious that if these offices were to attempt this as an 
annual undertaking, their work on the primary tables would be so 
delayed that they could not be published for years subsequent to the 
period to which they related. Timeliness of publication should not be 
subordinated to any other consideration. 


While the International List of Causes of Death does not contain a 
grand group definitely labelled ‘‘death by violence,”’ it is, nevertheless, 
true that Group XIV, “ External Causes,” relates exclusively to such 
deaths in contradistinction to those chargeable to morbid conditions. 

It is true, as Dr. Roesle states, that accidents are frequently factors 
in causing disease, and it is doubtless a fact that in this country and all 
other countries, physicians in certifying causes of death, sometimes 
report only the disease and fail to make mention of the fact that it was 
superinduced by violence. Personally, I do not believe that in the 
United States at least, this occurs very frequently; and, in fact, I 
believe that the number of such cases is negligible in which the vio- 
lence occurred not long before death. On the other hand, it is possible 
that where injuries were fairly remote, physicians often fail to state 
that injury played any part in causing death. And in these latter cases, 
the consensus of opinion, in this country at least, is that the proper 
place to classify the death is not under violence but under the disease. 
I have in mind a case where, a few years ago, the death of a Civil War 
veteran was reported as due to a certain bone disease which, the physi- 
cian noted, the patient had had ever since he had received a bullet 

















204 American Statistical Association [86 








wound in 1864. Surely, for purposes of mortality statistics, it would be 
entirely wrong to classify such a death as due to violence. While this 
is rather an extreme example, the same principle should apply in cases 
where the violence was not so remote. It would be in order for the 
next International Commission to rule on this question so that we 
might, by international agreement, fix a certain time limit under which 
we could give an injury the preference over a resulting disease in 
classifying causes of death. 

Another point that should be definitely settled relates to cases where 
the violence is very slight. For example, let us take a pin scratch 
which becomes infected and the patient dies of septicemia. It is a 
grave question in my mind whether the slight injury in such a case 
should be classified as the chief cause of death. In this country it is 
the practise of registration offices to make violent deaths, as a general 
rule, preferred causes over disease, in compiling mortality statistics. 

Violent deaths are reported quite accurately in the United States 
and, I believe, in England. I am inclined to think, from what Dr. 
Roesle says, that our reports of such deaths are much better than in 
Germany. In this country, the Census Bureau, state and municipal 
health departments, and some of the large insurance companies, query 
very thoroughly reports of death which are apt to conceal violence. 
Outstanding examples are ‘‘septicemia” and “peritonitis” with no 
further qualification; there are many others. But, as a matter of fact 
we do not find when we get corrections, that violence was involved 
very often. Many reports of “septicemia,’’ for example, when cor- 
rected statements are obtained from physicians, go ultimately into 
such titles as diseases of the pharynx and tonsils, or into the puerperal 
group; the “peritonitis” cases are allocated, in the end, very largely 
to tuberculous peritonitis, to appendicitis, gastric ulcer, diseases of 
the liver, and to puerperal conditions. In the annual mortality reports 
of the Bureau of the Census, a table is published, each year, showing 
the result of correspondence with physicians, in relation to original re- 
ports of unsatisfactory statements of causes of death. The last column 
of this table shows the titles of the International List which are prin- 
cipally affected by the replies received from physicians. Inspection of 
this column will show that there is very little effect upon the totals for 

violence as the result of the transfer of the deaths originally certified 
as due to disease. 

Neither would it be at all practicable, to “remove from the list,”’ 
such causes as ‘‘pyemia”’ and “septicemia.’”’ As a matter of fact such 
reports are often received from physicians who either do not know 
or do not state the real causative factor. Letters of inquiry often elicit 


‘ee 

















Gee eee 


Ty ae 


Ps 





87] Notes 205 


no further data. The only course that remains open to the compiler is 
to charge the deaths to the disease originally reported by the physician. 

As long as it is not practicable to compile mortality tables covering 
all the causes of death involved in every case, just so long will it be 
impossible to determine how many times any one cause was involved 
—no matter whether that cause be a form of violence or a disease. It 
simply cannot be done so long as we have to classify polynomial re- 
turns on a monomial basis. 

It is entirely true, as Dr. Roesle states, that life tables are not 
“worthy of consideration from the biological point of view.’”’ But the 
life table is not concerned “with the biological point of view.” Its 
function is to serve as a source of information to public health officials, 
actuaries, sociologists, vital statisticians, etc., concerning the expecta- 
tion of life at birth and at any given age. A life table, very properly, is 
based on the mortality from all causes of death as a group and not with 
the “natural” or “violent” causes separately. It is possible to deter- 
mine the effect of fatal accidents on longevity, and it would not be 
any more difficult than to find out the effect of tuberculosis, or of any 
other disease. In 1920, for example, the Metropolitan Life Insurance 
Company determined the effect of tuberculosis in shortening the life 
expectation of its industrial policyholders. It found that the after- 
life-time of white males was reduced nearly three and one-half years 
and of colored males, approximately five years because of tuberculosis. 

Deaths by violence make up, let us say, about one-twelfth of the 
total. There is no valid reason why persons interested in the biology of 
a life table should not conduct investigations relating to the remaining 
eleven-twelfths, from their special biological standpoint. 














American Statistical Association 


THE DETERMINATION OF PAST AND PRESENT 
SECULAR TRENDS 


By Lincoutn W. Haut, University of Pennsylvania 


In a previous article ' a method was devised for the purpose of com- 
puting secular trends for economic time series as quickly as the data 
were obtained. It is the purpose here to extend this method so that 
both past and present secular trends can be systematically calculated 
for any type of time series. 

There are two aspects to the determination of a secular trend. In 
the first place, there is the problem of securing the secular trend of a 
series for a past number of years. This aspect would present no diffi- 
culty if all series had trends which proceeded gently in some one given 
direction and if the rate of change of the trend was nearly uniform and 
remained in this condition. In this case a straight line or some form 
of parabola fitted to the data would determine the secular trend and 
usually with very little difficulty. Unfortunately, however, much, if 
not most, of our data do not follow any such even course, particularly 
through the recent war years. Our data, being of the historical variety 
and continuing indefinitely, do not necessarily follow any set course and 
may and do alter the direction of their trend as well as their rate of 
change. A glance at the charts will show the nature of this difficulty. 
For example, in the case of Bradstreet’s prices, the trend has a negative 
slope from 1892 to 1896; a positive slope from 1897 to 1914 although 
not at a constant rate; from 1915 to 1920 a positive slope at a greatly in- 
creased rate of change; during 1921 the series fell to a new level; and 
from 1922 to 1924 another positive slope. If we desired to find the 
trend of these data by fitting some curve from 1892 to 1924, we would 
have to obtain a curve which would have three irregular loops. This 
would be very difficult because the fit would need to be very close or 
else it would be useless, since our primary purpose in calculating a 
secular trend is finally to isolate the short time cyclical fluctuations. 
The other series represent different varieties of the same condition. 
The overdrafts series has first a negative trend and then a positive 
trend, while cash in vault has first a positive trend and then a nega- 
tive trend. 

When the above situation has been encountered in the study of a 
particular series, it has not been usual, because of the difficulties already 
mentioned and the uncertainty of the results, to attempt to fit one 


1A Moving Secular Trend and Moving Integration,” this Journat, March, 1925. 





































































SS 













































































OVERDRAFTS 








0 cae DEES 
1900 1905 1910 1915 1920 1925 
CHART I —— Original Data 
BRADSTREET’S PRICES ~~~ Geoular Trend 
600 
500 
400 4-11 [Ts 
4 ‘ 
i. v \ 
f | 
x. 
300 S 
‘ 
x 
. 
200 . 
WAY |S 
‘ NA \ 
‘ 
‘ 
100 \ J fess 
~’ 
0 
01 05 10 15 20 25 
CHART II Original Data 


Secular Trend 














208 American Statistical Association [90 





curve to the data. Instead, two other devices have been employed— 
the first has been to fit several straight lines or curves (usually straight 
lines) to the several parts of the data,! and the second has been to fit 
several straight lines or curves to the logarithms of the original data.’ 
Logarithms have been used for the purpose of smoothing out the fluctu- 
ations, so that even though straight lines would not adequately fit the 
original data, they might serve the purpose for the logarithms. How- 
ever, logarithms are of little or no help with series such as Bradstreet’s 
prices. With either of these methods there is apt to be difficulty en- 
countered in handling the data at the points where one curve 
stops and the other begins. Neither method provides any system- 
atic means of determining secular trends for any series that may be 
encountered. 

The second aspect of the secular trend problem is the calculation of 
present trends. Most economic time series represent continuing phe- 
nomena, and as such we desire to measure them indefinitely. Further- 
more we desire to make these calculations as quickly as the data are 
gathered. We do not want to have to wait until the data are several 
years old before we can obtain a close approximation to the secular 
trend inherent in the data. In this phase of the problem the two meth- 
ods mentioned above are useless, even though they may have served 
adequately for past data, since none of the curves can be safely extended 
into the present to serve as a secular trend from which we expect to 
measure the cyclical fluctuations. We are seeking to find, if possible, 
a method which will permit us to measure systematically the secular 
trend of any series which we may desire to study, and not only to meas- 
ure this trend for past time but for present time as well, so that we can 
make continuing calculations on this series as quickly as the data are 
secured. 

In the previous article a method was devised for the determination of 
present trends by calculating a separate straight line equation for each 
ordinate of secular trend, and using as the ordinate of trend the last 
point in a fit extending over a period of three years.’ This device is 
not particularly laborious when only a few ordinates are to be calcu- 
lated as is usually the case in present trends. However, if we should 
seek to use this method on a number of years of past time, we should 
find the work exceedingly tedious and laborious. Furthermore the 
trend found in this way would lack the smoothness which we desire. 


1 For an illustration of this method see the writer’s A Study of the Cyclical Fluctuations Occurring in the 
National Bank System During the years 1903 to 1921, p. 25. 

2 See Vanderblue and Crum, “ The Relation of a Public Utility to the Business Cycle,’’ Harvard Busi- 
ness Review, July, 1924. 
3 The reasons for this procedure are stated in the before mentioned article. See footnote p. 206. 











— 


























Notes 209 





91] 


Hence this device, while useful for present trends, is not particularly 
desirable for past trends, since we want a method which does not in- 
volve a prohibitive amount of labor and which can easily be made to 
yield a smooth trend line. 

It should be noticed at the outset that a series of connected curves 
possesses no essential advantage over a series of connected straight 
lines provided that the straight lines are short enough. This is evident 
from the fact that a curve is the limiting position of a series of con- 
nected straight lines as the lines become infinitesimally short and in- 
finitely great in number. Therefore we can secure a close approxima- 
tion to a continuous smooth curve by means of a group of connected 
straight lines if only the lines are short enough, and for secular trend 
purposes the lines do not have to be unreasonably short. For this 
reason the fitting of curves with their much greater computation need 
not be considered and our attention may be devoted entirely to the 
fitting of straight lines. This procedure may at times introduce error, 
but it will be evident from the charts that the error will be small, and 
when the secular trend ordinates are used to secure the cyclical fluctu- 
ations (particularly in relative form) this error will have no significance 
in determining the results. 

We are now ready to calculate a group of connected straight lines 
which will be our secular trend, both past and present. If we should 
choose some period, say three years, we could very easily fit a straight 
line to any three years of original data. From the equation so obtained 
we could calculate the ordinate corresponding to the last item in the fit. 
This would be the twelfth ordinate if a three year period of quarterly 
data were used, and the ordinate would come at the fourth quarter of the 
third year if, for convenience, we considered that the fit started at the 
first quarter of the first year. We could repeat this process by fitting 
the three years which ended in the year succeeding our previous fit. 
Calculating the equation for this fit, we could then find the ordinate of 
secular trend for the fourth quarter of this third year. This process 
would give us two ordinates of secular trend which would be one year 
apart. Since two points fully determine a straight line, we can connect 
the two ordinates which we already possess by a straight line and com- 
pute the intervening ordinates of secular trend. It would not be neces- 
sary to obtain the actual equation connecting these two points. Since 
we are connecting them with a straight line and a straight line has a 
constant difference between two successive ordinates, it is only neces- 
sary to take the difference between our two points and to divide this 
difference by four—for quarterly data—and add the resulting amount 
to the first point to get the first intervening ordinate, then add it again 











210 American Statistical Association [92 


to get the next ordinate and so on until the fourth addition will bring us 
to our second point. 

This process can be made clearer if we follow an illustration. In 
Bradstreet’s prices the ordinate for the fourth quarter of 1902 was 
secured by fitting a straight line to the original data for the years 1900, 
1901 and 1902. The resulting equation was y= .003z+7.80 and this 
equation was used only for one ordinate, the fourth quarter of 1902 
which was the last point in the fit. By substituting 11 for z, we calcu- 
lated this ordinate to be 7.77. We next fitted a straight line to the data 
for the years 1901, 1902 and 1903. This equation, which was y= .042r 
+7.56, we used only for the purpose of calculating the ordinate for the 
fourth quarter of 1903 which turned out to be 8.02. We then had two 
ordinates of our secular trend, 7.77 and 8.02, which were one year apart, 
and we obtained the three intervening ordinates by taking the differ- 
ence between 7.77 and 8.02, or .25, dividing it by 4, and adding this re- 
sult, .06+, to 7.77 which gave 7.83+-, adding it again which gave 7.89+ 
and so on until the fourth addition brought us to 8.02. This process is 
equivalent to fitting a straight line to the two points 7.77 and 8.02. 

We could continue the above process indefinitely by finding an equa- 
tion for each year and from each equation securing one ordinate of 
secular trend, and then connecting all these ordinates by short straight 
lines, each one year in length. This would give us a secular trend con- 
sisting of a group of short connected straight lines which would ap- 
proach a smooth curve in contour. This trend could be continued 
right up to the present moment, or as far up to the present as the data 
could be obtained. If it were undesirable to wait until the end of the 
year in order to secure an ordinate of trend which would determine the 
intervening ordinates, we could compute temporarily a separate equa- 
tion for each of the intervening ordinates as soon as the data were ob- 
tained.! For example, suppose that the last ordinate of trend which we 
had was for the last quarter of 1923, and we had just obtained the orig- 
inal data for the first quarter of 1924. If we did not wish to wait until 
the last quarter of 1924 to determine an ordinate of trend correspond- 
ing to the first quarter of 1924, we could calculate a separate equation 
for the first quarter of 1924 just as we had done for the last quarter of 
1923. We could do the same for the second and third quarters of 1924, 
and when we had finally secured the equation for the fourth quarter of 
1924, we could adjust the intervening ordinates, if it were desirable, by 
connecting 1923 and 1924 with a straight line. Through the use of 
this last device it is evident that our ability to calculate present secular 


1 See footnote, p. 206. 





_~_ ee ost ae ae fs 

















Notes 211 





93] 


trend ordinates depends only upon our ability to secure the original 
data.! 

It is not necessary always to use a three year period to secure the 
yearly ordinates of trend, any period might be used. Instead of using 
the original data of the years 1900, 1901, and 1902 with which to secure 
the ordinate of trend for the fourth quarter of 1902, we might have used 
the years 1897, 1898, 1899, 1900, 1901 and 1902, or any other group of 
years ending in 1902. However it is not possible to use various periods 
promiscuously in this way without having the ordinates of trend lose all 
connection with the data. The three year period in this study has been 
used as the standard period because there are reasons (explained else- 
where) for supposing that it will always give a close approximation to 
the true trend; ? also, being the shortest period that could logically be 
used,” it is highly sensitive to changes in the direction and slope of the 
trend and it is also easy to compute. Because of the above reasons, 
the periods other than the three year period have been used only for 
smoothing purposes. The fundamental movement of the trend has 
been computed always from the three year period, and then the results 
of the three year period have been smoothed by the use of the longer 
periods in which the six year period has been found the most useful. 

If we are dealing with data which do not fluctuate very much so that 
the direction and slope of the past trend are evident, it might not be 
necessary to compute an ordinate from a separate equation every year. 
In some cases it might be sufficient to compute separate ordinates only 
every two, three, or four years, or it might be possible to compute only 
two ordinates from equations through a stretch of ten or twelve years. 
In such cases, the principle of connecting the computed ordinates by 
straight lines is the same as above only there are fewer such ordinates to 
connect. 

The accompanying charts illustrate the use of this method to secure 
past and present secular trends. The data were chosen because they 
contain different types of fluctuations, all of which were either very 
difficult or impossible to handle by the use of usual methods. In 
“Bradstreet’s’’ prices a three year period was used from 1894 to 1904, a 
six year period from 1905 to 1914, a three year period from 1915 to 1921, 
a five year period for 1922, and a three year period for 1923 and 1924. 
“Overdrafts’”’ has a three year period from 1900 to 1904, a six year 
period from 1905 to 1915, an eight year period for 1916, and a three 
year period from 1917 to 1924. ‘Cash in Vault” has a three year 


1 This method differs essentially from the moving average, since the moving average gives equal weight 
to all the observations, while the above method does not, which makes it possible to place the calculated 
ordinate at the end of the period. 

? See footnote, p. 206. 











American Statistical Association 






























































900 A A>. 
i) : 
\ 
‘ 
4 
800 t 
i y 
; . 
v ‘ 
1 
700 + 
H 
a ’ 
e ; 
600 ' 
' 
an 1 
| 
y” 
| 
500 + 
' 
| 
’ 
s 
* 
400 -_ 
y . 
. 
y 
300 J 
200  ciieeeeeneitli tana ——————— 
ol 05 10 15 20 25 
CHART III ——— Original Data 
CASH IN VAULT -------- Secular Trend 


period from 1899 to 1902, a six year period from 1903 to 1912, a nine 
year period for 1913, a six year period from 1914 to 1916, a three year 
period for 1917, a four year period for 1918, a seven year period for 1919, 
and a three year period from 1920 to 1924.! 

An examination of the charts will show how closely the secular trend 
follows the original data. Since the cyclical fluctuations are short time 
changes, extending only over three or four years at a time, it is neces- 
sary for the secular trend to follow the original data closely if we desire 
to use the secular trend for the purpose of measuring cycles. If the 
secular trend did not follow the original data closely, then part of the 
cyclical fluctuations would be submerged or possibly concealed and a 
clear picture of these changes could not be obtained.? 


1 The periods such as five years, seven years, eight years, etc., were used only for the purpose of smooth- 
ing. The three year period determines the fundamental trend and the other periods were used arbitrari- 
ly in order to give the trend a little smoother contour. This need not be done, and it was done here pri- 
marily for purposes of illustration. The retention of a three year period throughout would have made 
no significant change in the calculation of the cycles. 

? The data used for illustration are Bradstreet’s index of price levels; overdrafts of national banks; 
and cash in vault of national banks. All the data are used on a quarterly basis. 








oa LL 


_— 











THE VARIABILITY IN THE DIFFERENCE BETWEEN THE MOR- 
TALITY OF MEN AND WOMEN IN COPENHAGEN DURING 
THE LAST TWENTY YEARS 


It is well known that the mortality of men and women does not run parallel 
through the age classes. On the other hand the difference between the mortality 
of men and women in Copenhagen for a number of years has previously been 
fairly constant from year to year. The difference between the total mortality 
quotients ! for men and women in the 6 five-year periods before 1912, for example, 
has only varied from 23 to 26 per cent of the women’s mortality. In other 
words, the men had an excess mortality of about a quarter of that of the women. 

It is, therefore, remarkable that after having fallen to 17 per cent in the five 
years 1913-1917, the difference dropped to 8 per cent of the women’s mortality in 
the next five years 1918-1922, and in the last two years 1923-1924, remained at 
quite as low a figure (see Table I).? 


























TABLE I 
DIFFERENCE BETWEEN THE MORTALITY OF MEN AND WOMEN IN COPENHAGEN 
1903-1924 
Mortality Percentage Excess of Men's 
Period Mortality over Women’s 
Men Women Mortality 
DE. <4 cose eadnngd enh éeauacwne ede 17.7 14.0 26 
Se re ere ee 16.4 13.2 24 
<. Kcn een ene taken caged ee sane ee 15.1 12.9 17 
| EE ee eae ern ae 13.3 12.3 8 
ar re Terr 11.9 11.4 4 
TABLE II 


PERCENTAGE EXCESS OF MEN’S MORTALITY OVER WOMEN’S MORTALITY FOR 
THE DIFFERENT AGE CLASSES 

















Age Class 0-5 6-25 26-35 36-45 46-55 56-65 | 66 and over 
Period 
RE 30 10 39 61 80 85 26 
a iis rs a 22 12 36 56 77 80 22 
EG its ic mney 29 ll 33 55 58 59 18 
a a a al 34 14 16 17 27 39 13 
= RPE Se 38 17 0 16 11 44 14 























The first way which suggests itself of further analyzing this striking alteration 
in the relation between the mortality of the two sexes is to investigate what 


1 Annual Reports of the Medical Officer of Health. 
2 This already has been proved for the whole of Denmark. See Ugeskrift for Laeger, 1925, p. 900. 














American Statistical Association [96 





214 


happened in the various age classes. This has been done in Table II. It will be 
clearly seen that the fall in the amount of the excess occurred exclusively in the 
age classes above 25 years, and that it was most marked in the three classes 
between 26 and 55 years. 

In the first of these age groups—26 to 35 years—in which the men formerly 
had an excess mortality of 39 per cent over that of the women, this has now 
completely disappeared. 

In the second group—36 to 45 years—the excess mortality has fallen from 61 
to 16 per cent, and in the third age class—46 to 55 years—the male excess 
mortality has decreased from 80 to 11 per cent. Above 55 years there is also a 
drop in the male excess mortality but the fall is not so enormous. 

The greatest fall in the difference between the mortality of men and women 
thus occurs in the age classes where experience shows that alcoholism has been 
rife. Has any great change, therefore, taken place during the investigated 
period, in the consumption of alcohol among the Danish population? 

The annual consumption of pure alcohol per capita of the population is shown 
in Table III. 

TABLE III 


THE ANNUAL CONSUMPTION OF PURE ALCOHOL IN LITRES PER CAPITA 
OF THE INHABITANTS DURING THE YEARS 1901-1923 








errr re 8.3 litres OR ie ae eee aa | 2.7 litres 
Ea os nh mare Gae a” RS ew mien wu eutudrecid TY We 
AER CSE ae 6.5 Diath cies weeniaauces 2.8 
RE: 34 








A very marked fall in the consumption from 7 or 8 litres per person, in the 
first series of years of the period examined, to less than 3 litres in the last few 
years is observed. 

If we now take the mean standard ! which Professor Harald Westergaard first 
put forward in an address in Oslo in 1910, we find that the smaller consumption 
in 1923 corresponds to saved life-time o: about 46,000 years for the whole of 
Denmark’s population with the consumption during 1901-1905. 

Professor Westergaard found his mean standards chiefly from the elaboration 
of material drawn from the insurance societies, but he also stated that the mean 
length of life of the average men of 20 years of age in the Danish provincial towns 
would increase by more than one year if deaths from alcohol could be eliminated. 

In Copenhagen the rates for the causes of death in question (chronic alcoholism, 
delirium tremens and mors in ebrietate) sank from 96 annually in the five years 
1906-1910 to 12 annually in the five years 1920-1924. 

How much has the mean length of life for a man of 20 increased simultane- 
ously? This is shown in Table IV. 

It will be observed that a young man of 20 now has the expectation of living 
three and one-half years longer than during the five years 1906-1910. In the 

1 Curtailment of the duration of life by about 11 hours for every litre of spirits consumed and by 25 


minutes for every half bottle of Bavarian beer. Tidskrift for den norske Laegeforening, 1910, pp. 
782, 827, 876 and 939. 




















Notes 


TABLE IV 
MEAN LENGTH OF LIFE IN COPENHAGEN 




















1906-1910 1920-1924 
Years of Age 
Men Women Men Women 
eee er err eer ee 47.5 55.0 56.0 60.6 
OO ee Ee Leek he ame Men wee 54.8 60.9 58.7 61.7 
SRO er eae 50.6 56.7 54.6 57.3 
a le rer al a ea a 46.2 2.3 49.9 52.7 
ia a ani wht ae aOR SS 41.8 48.1 45.6 48.5 
hd 6's weteseheesaednnceeeneeneasduses 37.6 43.8 41.4 44.3 
Dh. nue 60 eebehO onsen 646000 neeennseee 29.7 35.6 33.1 35.9 
A eee ane gi enk rie en hae win 22.4 27.7 25.0 27.6 
et eee ee aac bere eseN 16.1 20.3 17.5 20.0 
De oar dia See ee See a cGeee 10.8 13.4 11.4 13.1 
i ..  i eaeeehe Gee Re knwe dba Naipeae 7.0 8.7 6.7 8.0 














case of a young woman, on the other hand, a far smaller extension of only half a 
year of the mean length of life has taken place. 

There are, of course, other factors which have played a part, for example, 
housing conditions and alteration in the working hours, but the task of assessing 
their importance is quite outside the purpose of this medico-statistical enquiry. 
The object here has merely been to point out how deep may possibly have been 
the effect of the considerable decrease in the consumption of alcohol on the 
expectation of life of men and on their power to perform productive work. 

Pout HEIBERG 

Copenhagen, Denmark 


THE NEW SERIES ON COMMONWEALTH OF PENNSYLVANIA 
DEPARTMENTAL STATISTICS 


A bulletin for public information has been published by the Department of 
State and Finance, Harrisburg, Pennsylvania (1925, pp. 323). This modest 
statistical volume represents a most important departure in the statistical 
literature, not only of the State of Pennsylvania, but of the entire country. As 
far as the reviewer knows, it is the first effort on the part of any state of the 
Union to gather one volume of statistical information concerning the entire state, 
and thus may be considered as the beginning of a development of statistical year 
books, which are very necessary both for our states and our larger cities. 

It is one of the tragedies of statistical science in this country that the results, 
as far as published information is concerned, are not at all commensurate with 
either the efforts or the costs connected with such efforts. This is particularly 
true of governmental statistics. The United States Government is the only one 
which has met this need by the publication of a statistical year book, initiated 
several decades ago. Originally almost limited to statistics of commerce 
(because of the accidental circumstance that the Bureau of Statistics was a part 
first of the Treasury Department and then of the Department of Commerce) 
it has gradually developed in scope, so as to embrace the results of all other 
statistical services of the Federal Government. Although it even now assigns too 














216 American Statistical Association [98 


large a proportion of the volume to commerical statistics, it is the nearest ap- 
proach to what a statistical year book should be. 

For states and cities, no such general statistical source has as yet been avail- 
able. The tremendous amount of effort spent in gathering statistics by states 
and cities is very largely wasted, because the results appear in state and municipal 
reports of departments, bureaus, commissions, boards, etc., which very few 
receive and fewer people read. Yet precedents for such statistical annuals of 
political subdivisions may be found throughout Europe and even in South Amer- 
ica. The various subdivisions of the German Empire are publishing them. 
Very large European and many South American cities have had them for many 
years. The collection of statistical information by states and by cities not only 
will have a technical value, but must play an important part in the improvement 
of our state and municipal governments. If one should, for instance, compare 
the statistical publications of Belgium with those of some of our states which, 
in population and certainly in economic resources, exceed those of Belgium many 
times, the comparison would not be in our favor. 

Like every new venture, Pennsylvania Departmental Statistics necessarily 
displays certain shortcomings, some of which probably were inevitable because 
of legal limitations, and others would probably permit of improvement even 
under present circumstances. The value of a statistical annual may be judged 
from the degree in which it meets the requirements as to: (a) Comprehensiveness 
of material; (b) Proper choice of material; (c) Proper distribution of space; 
(d) Proper arrangement of statistical matter; (e) Proper statistical presentation; 
(f) Table of contents and index to facilitate its use; (g) Timeliness of its publi- 
cation. 

As to the comprehensiveness of the information contained in the volume, much 
is to be desired, but this is probably due entirely to the legal conditions of its 
publication. The volume is not yet entitled The Statistical Year Book, but only 
Departmental Statistics. It is limited to statistical data which either record the 
activity of the departments or were collected by the departments in the exercise 
of their duties. To make it a complete statistical annual, much information 
should be added from the United States census reports and other sources, and it 
may be hoped that the necessary amendatory legislation will be passed which 
will enable the editors to go beyond statistics directly gathered by the various 
departments. 

It is somewhat to be regretted that the financial statistics of the state were 
omitted from this volume. The reason for the omission given in the foreword 
is that such statistics appear in detail in the budget. Nevertheless, it would 
seem to be desirable that at least abstracts of the main data be included in this 
general report. 

Probably the conception of the volume, primarily as a collection of depart- 
mental reports rather than a systematic year book, is responsible for the some- 
what uneven distribution of space. One hundred and fifteen pages are devoted 
to the statistics collected by the Welfare Department, as against only 20 pages 
to the Department of Agriculture, and 23 to Labor and Industry. With all 
one’s professional enthusiasm for statistics of social work, evidently the space 














ay perry 


ee ene ee ee 











ily 


ed 









bla ha rine etre 





ere hee 


a hte 


; 





99] Notes 217 


allowed does not correspond to the comparative importance of those subjects. 
Probably the reason is due to the comparative efficiency of the statisticians, and 
the collective volume may perform a very useful service in indicating the weak 
spots in the statistical work of the separate departments of the state governments. 

The arrangement of the various topics in the book is not at all on logical lines, 
but seems to follow an alphabetic order of the departments of the government. 
As no table of contents is provided for some reason (though a very sat'sfactory 
index partly compensates for that) this arrangement is not obvious. 

The topics are taken up in the following order: Agriculture, banking, fisheries, 
forests, game, health, highways, insurance, Pennsylvania National Guard, mines, 
schools, public utilities, state and finance, state police, welfare. One is led to the 
suspicion that rivalry between the departments made such an arrangement 
inevitable, and one may hope that in the future this new publication will so 
establish itself that it may gain strength enough to disregard so unimportant a 
consideration. It is only due to the accidental character of our alphabet that 
fisheries, forests and game statistics are put together, but obviously health 
statistics should not be next to highways, nor school statistics wedged in between 
mines and public utilities. Moreover, many of these departments deal with 
correlated subjects, and it would be highly desirable that the entire statistical 
material be arranged in some logical order, with references given to departments, 
bureaus and statistical publications for each table. 

The same unevenness as in the amount of space allowed, is also noticeable in 
the quality of technical statistical methods of the various departmental reports. 
This is noticeable in the amount of space sometimes unnecessarily wasted because 
the tables are not arranged to the best advantage, considering the size of the book. 
It is noticeable in the indefiniteness and sometimes inaccuracy of table headings. 
For instance, Table 4 is entitled “Crop and Live Stock Production—Numbers 
and Value.”’ In this table are found statistics of horses, milk cows, other 
cattle, swine, etc., but one is not certain whether the figures given represent “live 
stock production” or “live stock census.’’ Table 29, entitled ‘‘ Distribution of 
Fish in Pennsylvania by Counties from December 1, 1922 to January 1, 1924,” 
may raise in the mind of any one who is not an Isaak Walton, the question, what 
is meant by distribution. Presumably, no census of the fish population of Lake 
Erie and various counties has been intended. Probably the table is meant to 
show a distribution from fisheries, but that should have been clearly explained. 
More errors of this kind are found in some departmental reports than in others, 
but if sufficient authority is given to the statistician in charge of the Year Book, 
they can be prevented and an influence established for improvement in the qual- 
ity of the statistical work in all the departments. 

However, all these criticisms are of very minor importance. In making such 
a volume possible, Dr. Clyde L. King, Secretary of the Commonwealth of Penn- 
sylvania, has made a very great contribution. His promise in a personal letter 
to the reviewer that the 1925 volume will appear shortly, will meet any possible 
criticism as to timeliness. One may only hope that this new venture will find 
its imitators in other states as well as in our larger cities. 

I. M. Rusinow 











218 American Statistical Association [100 


PROGRESS OF WORK IN THE CENSUS BUREAU 
SuRVEY OF CURRENT BUSINESS 


Doubtless most readers of the JouRNAL are well aware that the Department of 
Commerce issues a publication called the Survey of Current Business, which is pre- 
pared in the Bureau of the Census. As it happens that the current month of 
June is the fifth anniversary of the start of this enterprise, it seems an appro- 
priate time for a brief review of it. Unlike the other publications issued by 
this Bureau, the Survey is not a compilation of original data, but it renders a 
service equally useful by bringing together under one cover and thus making 
readily available the data already existing or being compiled by various inde- 
pendent agencies, principally trade associations and government bureaus, the 
Census Bureau itself being one of the principal contributors. The class of data 
covered relates to current production, stocks on hand, consumption, prices, and 
many other indexes of the current or recent movements of industry and trade. 

The first issue brought together data supplied by 45 trade associations and 
private organizations, and 17 government agencies, bureaus, etc., and included 
about 271 business items. From this small beginning the publication has grown 
until it now contains about 1,500 items derived from 155 trade associations and 
49 government agencies. 

The information is given to the public in the following manner: 

(1) Semiannual numbers, dated February and August, presenting monthly 
data for the past two or three years and annual averages back to 1913 
where available, together with sources and complete explanation of all 
items, and index numbers on the more important items extending back 
to 1913. Descriptive text matter and charts are also included. 

(2) Monthly numbers, issued in the months intervening between the semi- 
annual numbers, presenting the figures for the latest months and corre- 
sponding periods of the previous year, percentage changes and cumula- 
tives, as well as the regular text matter and charts, and including special 
tables on new matter whenever available. 

(3) Weekly advance leaflets, sent automatically to all domestic subscribers 
of the Survey of Current Business, presenting such data as have been re- 
ceived during the previous week. These get the early figures into the 
readers’ hands at an earlier date than the complete bulletin can be made 
available. 

(4) Press statements containing the text matterin the advance leaflets, through 
which the business trends as shown by the latest statistics can be quickly 
made known to the general public through the newspapers. 

(5) Weekly statements on the facts of general business for radio transmission. 

(6) Statement of weekly business figures, which is published for us by the 
Chamber of Commerce of the United States. 

(7) Special monthly statements on particular industries, for which the individ- 
ual returns are gathered in the Bureau of the Census, comprising not 
only the 18 industries for which individual data are collected by the 





nn sat  eftllCOef 





101) Notes 219 


Survey of Current Business Division (the more important of which 
include automobile production and bookings of fabricated structural 
steel), but as many more collected by other divisions of the Bureau 
(such as cotton ginnings and consumption, shoe and leather produc- 
tion, etc.). 

As a further aid to the use of these data on business statistics, there is in course 
of preparation a record book of business statistics, in which the various items 
regularly included in the Survey of Current Business will be shown month by 
month as far back as 1909, if available. This will aid students of business condi- 
tions in ascertaining seasonal trends, secular movements, etc., over a long period 
of time. 

Census OF AGRICULTURE 

Bulletins giving the final figures of the 1925 census of agriculture by counties 
have been published for more than one-third of the states, as follows: California, 
Connecticut, Delaware, District of Columbia, Florida, Iowa, Kansas, Maine, 
Maryland, Massachusetts, Nebraska, New Hampshire, New Jersey, North 
Dakota, Rhode Island, South Carolina, Vermont, and Wisconsin. A number 
of others are in the hands of the printer. 

Several press summaries have been issued giving preliminary figures for the 
United States, by states. One of especial interest is that giving the number of 
farms reporting tractors and radio outfits. 

It was not practicable within the time and funds allowed for this census to 
tabulate the data on the basis of the township, as was urgently requested by 
agricultural colleges and organizations in various sections of the country. The 
Bureau, however, agreed to make a township tabulation for any state that might 
request it, the state paying the cost of the work. Tabulations of this kind have 
been made for four New England states, New Hampshire, Rhode Island, Con- 
necticut and Massachusetts, and others are being arranged for. 


BIENNIAL CENSUS OF MANUFACTURERS, 1925 


The manufactures canvass was begun promptly after the close of 1925, and 
returns have been coming in at about the same rate at which they were received 
at the last preceding biennial census (that for 1923). From present indications, 
it is likely that the canvass will not be brought to completion until after July 1, 
but in the meantime the publication of preliminary industry reports has been 
begun. In fact, the first preliminary report, covering the consumption of vege- 
table tanning materials, was issued under date of April 16. Other preliminary 
reports which have been issued include those on the manufacture of motor vehi- 
cles and rayon, which were issued in time for inclusion in the 1925 edition of the 
Commerce Yearbook. 

The process of tabulation is carried forward promptly as the returns come in, 
with a view to being able to issue a bulletin or press summary shortly after the 
receipt of the last schedule for a given industry or city. But, as at former cen- 
suses, the Bureau is handicapped by the dilatoriness of many of the manufactur- 
ers in making their returns and by their failure to answer the questions com- 



































220 American Statistical Association [102 





pletely and accurately. Many thousands of letters in regard to the correction of 
reports have already been written, and it is likely that the total number of such 
letters will exceed 100,000. 
Census oF DIsTRIBUTION 
The matter of conducting, in connection with the next biennial census of manu- 
factures (that of 1927), a canvass of wholesale and retail mercantile establish- 
ments is under consideration. If this work is undertaken, the canvass will prob- 


ably be made in coéperation with the United States Chamber of Commerce, and 
the local chambers of commerce will aid the Bureau in collecting the data. 


SESQUICENTENNIAL EXHIBIT 


The Bureau has prepared an exhibit for the Sesquicentennial at Philadelphia. 
This includes electrical tabulating machinery, charts and maps covering the 
various divisions of the work of the Bureau, publications, etc. One feature, 
which will probably be of considerable popular interest, is the device for showing, 
by successive flashes of light thrown on circles or disks of different colors, the 
occurring births, deaths, arrivals of immigrants, and departures of emigrants, 
and the resulting net additions to the populations of the United States. Thus 
the bystander may watch the population grow; and it may be of some interest 
to the readers of the JourNAL to know what this speedometer records: A 
birth occurs every 12 seconds; a death, every 24 seconds; an immigrant arrives 
every 134 minutes; an emigrant departs every 534 minutes; and there is a re- 
sulting net addition of one unit to the population every 20 seconds. These are, 


of course, averages. They are based on the records of the fiscal year 1925. 
J. A. H. 


SALES QUOTAS 


THE NEW YORK DINNER MEETING OF THE AMERICAN STATISTICAL ASSOCIATION 


The dinner meeting of the American Statistical Association was held at the 
Aldine Club, 200 Fifth Avenue, on Friday evening, March 5, 1926, the subject of 
discussion being Sales Quotas. The meeting, held in conjunction with the Amer- 
ican Management Association, was ably presided over by Mr. Joseph H. Barber 
of the Walworth Company. One hundred forty-five persons were present. 

The first speaker of the evening was Mr. Henry G. Weaver, Assistant to the 
Director, Sales Section, General Motors Corporation, Detroit. In his remarks 
he emphasized the fact that sales quotas should be based upon potentiality 
rather than actuality; in other words, the company having merchandise to sell 
should interest itself, not so much in how many goods have been sold in a given 
territory, but rather in how many goods the market in that territory is capable 
of absorbing. 

Mr. Weaver also stressed the point that, because a given district will absorb 
one per cent of the total sales of a given commodity in the United States, it by no 
means follows that it will also absorb one per cent of some other commodity. 
Since this is true, it is generally necessary to build a new set of quotas for each 





6 Eee 


















lps 


eg Sad Tee 











02 


of 
ich 


u- 
sh- 
b- 


nd 

















103] Notes 221 





particular type of commodity. His studies indicate that the potential market 
for motor cars is largely dependent upon the number of persons having a sufficient 
margin between income and cost of living to permit of purchase. 

The quotas which he has computed for this particular purpose have therefore 
been based primarily on the estimates of income by states made by the National 
Bureau of Economic Research. Since income figures are not available by coun- 
ties, he found it necessary to develop a formula involving only such data as are 
available by counties but which would provide an index reflecting the per cent 
of the state income received by each county. To test the merits of such a for- 
mula it was first applied to the states themselves. This procedure provided a 
basis of checking the formula estimates with the more refined estimates pro- 
vided by the National Bureau of Economic Research. 

(A detailed account of the methods employed by him for the distribution of in- 
come by counties will appear in the April issue of the Harvard Business Review.) 

The next speaker was Mr. Donald R. Cowan, of the Commercial Research 
Department, Swift and Company, Chicago. He expressed the view that many 
of the criteria used in the construction of sales quotas are erroneously believed to 
be of merit merely because they vary somewhat in proportion to population and, 
in the case of many kinds of merchandise, population is a dominant feature in 
determining the amount of goods sold. He suggested, therefore, that the first 
step necessary in most instances is to place all data on a per capita basis. The 
maker of sales quotas should always distinguish carefully between urban and 
rural markets, for these two markets differ widely in many respects. 

Mr. Cowan’s experience indicated that it is easier to obtain good results by the 
use of correlation than by depending mainly upon trial and error. He stated 
that he had found magazine circulation to be correlated very closely with urban 
population. He, therefore, believes that, in most instances, it is better to use 
urban population directly rather than magazine circulation. 

In making up sales quotas the statistician is greatly hampered by the fact that 
many of the data are not in the form desired. Census figures, for instance, in 
many cases fail to cover the exact geographic units in which the statistician is 
interested. The government figures showing the value of agricultural produc- 
tion in a given area do not distinguish between crops sold and crops fed to live 
stock, the result being that there is much duplication. Income tax returns all 
show incomes in dollars and fail to record the fact that dollars do not have the 
same purchasing power in the city as in the country. 

Mr. Cowan stated further that he had found internal statistics more useful than 
external statistics. In other words, the best place to seek for the necessary infor- 
mation is in the records of the company making the sales. He had also discov- 
ered that a sales quota is useless unless it takes into consideration not only the 
purchasing power of the community but also the intensity of the competition pre- 
vailing in the area. 

A statistician attempting to construct sales quotas will rarely succeed in pro- 
ducing useful results unless he is well trained both in economics and advanced 
statistics and is also conversant with the peculiarities of the business under con- 
sideration. 














222 American Statistical Association [104 


The third speaker was Mr. Everett R. Smith of the Fuller Brush Company of 
Hartford. He stated that his Company is interested rather in proportional than 
in absolute sales quotas and they believe that the market possibilities for their 
products are still so great that they need not trouble themselves at present about 
the chances of reaching the saturation point in any of the territories covered. 
Their interest lies in studying the buying habits rather than the buying power of 
the people of each community. Their experience indicates that these buying 
habits change so rapidly that data must be current to be of any value. Since 
their buyers are scattered among all classes of population, they have found the in- 
come tax returns, covering as they do only the wealthier classes, to be of prac- 
tically no value for their purposes. 

The figures that they would like to have would be those showing the total pur- 
chases of commodities similar in nature to those dealt in by the Fuller Brush 
Company. In an effort to approximate such figures, a group of concerns have 
banded together and are now reporting their sales in the various geographical 
units. The combined sales are used as an indication of the sales possibilities of 
the given territory. The figures used are all for commodities appealing to the 
same class of customers interested in Fuller Brushes. The data are up-to-date 
and the results are proving very satisfactory. 

The discussion was begun by Mr. Paul T. Cherington of the J. Walter Thomp- 
son Company of New York City. Mr. Cherington issued a word of caution 
against the making of too sweeping generalizations. He agreed with Mr. Smith 
that, in using data as criteria, it is essential to discard those not pertaining to 
similar lines. Mr. Cherington pointed out further that purchases may b. divided 
into two classes, goods which may be classed as part of current expenses and goods 
bought with surplus funds after the essentials have been paid for. The markets 
for these two classes of commodities do not behave in the same manner. Again, 
purchases may be classified into goods bought regularly and goods bought oc- 
casionally and the markets for these two types also are dissimilar. In using 
statistics of magazine circulation, one should always keep the fact in mind that 
subscribers to magazines represent primarily the mentally alert section of the 
population and therefore they are not necessarily typical of the population as a 
whole. 

The discussion was continued by Mr. Arthur Lazarus of the Dry Goods Econo- 
mist. He stated that it is obvious that the term “‘Sales Quota” has been used 
in two senses. 





1. Potential buying power. 
2. An immediate task for the sales organization to perform. 


As to the first, the sales quota in terms of potential buying power, there are two 
things which may tend to discredit it: Premature publication of ill considered 
and erroneous estimates; and too long postponement of any kind of an estimate. 
The company that estimates sales quotas has two purposes in view: 


1. To move in tune with the progress of the industry. 
2. To get all the business that it can as an individual company. 




















Notes 223 





105] 


Many trade associations are doing excellent work for their membership in col- 
lecting information from competing companies. One trade association presents 
to its members statistics of current production, shipments, stock orders, imports, 
exports, and new machinery installations. An individual company comparing 
its results with the collective results of the industry may know whether it is doing 
a good job—as good a job as the industry as a whole. Such information, too, 
tends to stabilize the sales and production efforts of the industry. Like Mr. 
Weaver, he also felt that, with the great variety of methods pursued in the estab- 
lishment of sales quotas by individual companies, there is something to be gained 
by collecting and compiling the methods of perhaps a hundred companies. Pre- 
sumably, it is possible to resolve these methods into five or six classes. A given 
concern could then determine by studying the compiled information for its own 
class, better methods for determining its own sales quotas. 

Mr. A. Heath Onthank of the United States Department of Commerce con- 
tinued the discussion. He said that there was frequently a tendency to reject 
any method of calculating sales quotas that was not very simple. He felt, how- 
ever, that a simple method was likely to be so erroneous as to be entirely value- 
less. It is evidently much better to have a right method than to have a simple 
method. 

In the whole study of marketing, it is necessary to differentiate between com- 
modities with which the market is already saturated and those in which there is 
room for expansion. The market for goods of the first class can evidently be 
expanded only by taking away business from competitors in the same field. In 
other lines, expansion in the market takes place not at the expense of competitors 
of the same field but at the expense of competitors in other fields. 

Mr. Onthank emphasized the fact that the Department of Commerce is anx- 
ious to serve the needs of business and that business men should not hesitate to 
ask for the aid they believe necessary. Such requests should be presented di- 
rectly to the Bureau concerned. 

The discussion was closed by Mr. W. J. Donald, Managing Director of the 
American Management Association. He lamented the fact that sales quotas 
are so frequently constructed on the principle of providing a pace-setting machin- 
ery for salesmen and contended that such a method is very likely to destroy the 
morale of the sales organization and accomplishes little or nothing in increased 
sales. Furthermore, where it is effective, it is too likely to lead to loading up the 
market with more goods than can readily be moved by the dealer. He pointed 
out that sales quotas set for pace-making purposes are very likely to cause the 
production of more than can be sold and to leave the financial executives of a 
company with an embarrassing problem of inadequate cash resources. He fav- 
ored the basing of quotas on an honest estimate of what is likely to be accom- 
plished and contended that such a basis of quotas is a sound and fundamental 
basis for budgeting not only sales expense but also production and financial re- 
quirements. He decried the policy of some companies of having three budgets— 
an optimistic budget for stimulating salesmen, a conservative budget in order 
to protect financial resources and an “‘honest”’ budget for the guidance of the 

executives. The latter seems to him to be the only one fundamentally sound 














































224 American Statistical Association [106 





from the point of view of fixing sales quotas and budgeting practice generally. 
The meeting then adjourned. 


MISCELLANEOUS NOTES 
F. Y. EDGEWORTH 


The science of statistics has suffered a great loss in the death of F. Y. Edgeworth on 
February 13, 1926, at the age of 81. Till the day before his death from pneumonia 
he was attending to the business of the Royal Economic Society and to the proofs 
of an article just published in the Journal of the Royal Statistical Society. He was in 
full physical and mental vigor with his critical faculty as alert as ever, and throughout 
the past winter was generally present at meetings of economists and statisticians; in 
the latter part of January he served on a small committee to which some problems 
arising in the statistics of the coal commission were referred. Perhaps his charming 
combination of courtesy, wit and wisdom, which for a generation had made his 
company a delight to all who were privileged to know him, whether at meetings of 
learned societies, at the Savile Club or at All Souls’ College, Oxford, may be traced 
to his parentage. On his father’s side he was of the Edgeworths, of Edgeworthstown, 
Ireland (Co. Longford), a family settled there in the time of Essex, of which the best 
known member was his aunt, Maria Edgeworth, the authoress. On his mother’s 
side he was Spanish. By education and interest he was alike philosopher, classicist, 
mathematician and economist. His genius did not obtain recognition till the publica- 
tion in 1881 of Mathematical Psychics, An Essay on the Application of Mathematics 
to the Moral Sciences, a brief work, now rare,! in which a very important part of the 
theory of mathematical economics was founded. In 1891 he was appointed Professor 
of Political Economy at Oxford, a post he held till recently, and in 1893 he became 
editor of the newly founded Economic Journal, and continued in that work literally 
to the eve of his death. He published no book, except that first named and his 
writings are scattered in several journals and reports. Those on economic subjects 
were recently collected and published under his editorship by the Royal Economic 
Society. In his statistical papers there are many threads of analysis and thought, 
often difficult to follow in their separate settings. Their mathematical arrangement 
is often the despair of the formal mathematician, who is not prepared to pass from 
idea to equation with the help of a simile from Vergil, and who prefers certainty in 
the premises and unambiguous results. But Edgeworth delighted in the more 
subtle regions of probability, a subject whose meaning and history were constantly in 
his thoughts. From mathematical psychics onwards he was more interested in the 
handling of inequalities rather than equations, in probability rather than certainty. 
In consequence he has been one of the very few mathematical statisticians whose 
work is founded deeply in ultimate premises, and those who have the patience to 
acquire his technique will not only follow very pleasant paths in mastering his mean- 
ing, but also will acquire much more definite knowledge of the true import of statistical 


formulae than is in the possession of most students. 
A. L. BowLey 


Joint Meeting of the American Statistical Association and the American Public 
Health Association.—The American Statistical Association and the American Public 
Health Association held a joint dinner meeting at the Aldine Club in New York City 
on Friday evening, April 9. Fifty-five persons were present. The gathering was 


1It is hoped that arrangements will soon be made to reprint it. 











= 


po es 








107] Notes 225 
in commemoration of the twentieth anniversary of the death of William Alexander 
King, the first Chief Statistician for Vital Statistics in the Permanent Bureau of the 
Census. The meeting was presided over by Dr. Louis I. Harris, Commissioner of 
Health of New York City. The guests of honor were Sir Arthur and Lady News- 
holme. Papers were read by Dr. Walter F. Willcox on the “Services of John Shaw 
Billings in the Development of National Vital Statistics, 1880 to 1900’’; by Mr. George 
H. Van Buren on “William Alexander King and the Federal Registration Service, 
1900 to 1906’’; and by Dr. William H. Guilfoy on “Cressy Livingston Wilbur and the 
Federal Registration, 1906 to 1914.” The topic for discussion was the “Future of 
the Federal Registration Service.”” The participants were Dr. William H. Davis 
and Dr. Louis I. Dublin. Full publication of these papers in planned in the Septem- 
ber issue of this JOURNAL. 


Activities of the Group Advisory to the Bureau of Agricultural Economics.—By 
invitation of Honorable Thomas Cooper, Head of the Bureau of Agricultural Econo- 
mics, United States Department of Agriculture, a Committee, consisting of C. W. 
Doten, W. I. King, N. C. Murray, and G. F. Warren, met with the officials and 
members of the staff of the Divisions of Crop and Live Stock Estimates and Sta- 
tistics of that Bureau in Washington for a three days’ conference, February 17-19. 

While this Committee does not officially represent the American Statistical Associa- 
tion in the way that the joint Committee on the Census does, it should be noted that 
it was originally called together by Dr. Henry C. Taylor, then head of the Bureau, 
because of his knowledge of the service the joint Committee was rendering the Bureau 
of the Census. The Committee was organized by Dr. Taylor at the Pittsburgh 
Meeting in December, 1921, and held its first meeting in Washington, February 3-4, 
1922. A second meeting was held March 3-4, 1922, and a third meeting was held a 
year later, February 23-24, 1923. At this meeting G. F. Warren took the place of 
W. M. Persons, who had found it necessary to resign from the Committee because 
of the pressure of other duties after the first year of service. At the last meeting Mr. 
Murray, formerly a member of the staff of the Bureau and now engaged in statistical 
work in Chicago in connection with the grain market, was added to the Committee. 

At its meeting the Committee has dealt with a wide range of problems in regard 
to methods of obtaining, testing and checking data, the publication of results, and the 
extension of the work. It has given much informal advice in the conferences with 
the experts on the staff and has made a considerable number of recommendations in 
formal reports to the head of the Bureau. 


United States Bureau of Labor Statistics.— Revision of Wholesale Price Index 
Number. The Bureau of Labor Statistics has begun the work of revision of its 
wholesale price index numbers. There will be a considerable addition to the number 
of quotations carried. There will be a considerable accession of new articies, while a 
few, though probably very few, of those now carried will be dropped. It is believed 
that a satisfactory scheme can be developed for including automobiles in this index 
number. It has been proposed that self-dumping freight cars, typewriting machines, 
and a number of other articles not now carried can be adjusted to the scheme. It 
will be interesting to note the details of the methods by which a composite type of 
typewriting machines, for instance, can be developed as a unit to be applied to the 
weighting factors. If the plan is worked out successfully the method will be described 
at some future date. It seems probable now that the quantity production for 1925 
will be used as the weighting factor in the revision of the index number now being 
planned. The base price will be changed from 1913 to some post-war year. 

















































226 American Statistical Association [108 


A Handbook of Labor Organizations in the United States. As arather new departure 
for the Bureau, there is being prepared, and will soon be published, a substantially 
complete and rather elaborate handbook of the trade union organizations of the 
United States. This will be much more than a mere directory, as it will contain all 
the essential organization features of each union. The work will be coupled up more 
or less intimately with the publication of trade agreements which the Bureau has 
taken on as a permanent addition to its work. Already one substantial volume, 
Bulletin No. 393, on wage agreements has been published and another is in prepara- 
tion. The Bureau has for several years kept a running record of agreements in the 
Monthly Labor Review, but the space permissible for this subject in that publication 
has never been sufficient to cover the ground satisfactorily. 

Labor Productivity Index. An attempt is being made in the Bureau of Labor 
Statistics to compile an index of the productivity of labor by industries and industry 
groups. The plan is to coérdinate the volume of employment index now carried by 
the Bureau with production wherever satisfactory statistics of production for an 
industry can be obtained. These production figures wil! be reduced to an index, 
then a combination of the two indexes will be developed into an index of labor pro- 
ductivity. Dr. Ewan Clague of the University of Wisconsin has been placed in 
charge of this work which is being done in coéperation with the Conciliation Division 
of the Department of Labor because of the very apparent application of its results to 
wage controversies. 

Industrial Accident Conference. The Secretary of Labor will call an industrial 
accident conference sometime in July. The governors of the various states will be 
requested to send a representative of the accident prevention and accident reporting 
bureaus of their states. In addition to thic . call will be sent to the principal indus- 
tries now engaged in effective accident prevention work. The conference will be 
called through the Bureau of Labor Statistics and its general policy and purpose will 
be outlined by that Bureau. 

Wages and Hours of Labor. Field work on wages and hours of labor in the auto- 
mobile industry, 1926, is practically finished. A survey of wages and hours of labor 
in the iron and steel industry for 1926 is fairly under way, the field work having been 
begun March 1. 

Mr. Jesse C. Bowen, chief statistician of the Bureau, is in Europe attempting to 
bring together industrial information in connection with the pottery industry to 
correspond with the material secured from plants in the United States. Mr. Bowen 
will visit England, France, Germany and Czechoslovakia. 

A study of the development of increased productivity of industrial machines by 
industries has been indertaken by the Bureau. Dr. Boris Stern of Columbia Uni- 
versity is engaged in this study and is now in the field of investigating the glass-mak- 
ing industry. 

Mr. Hugh 8. Hanna, a graduate of Johns Hopkins University, who was with the 
Bureau for a number of years and who resigned to take a position with the War 
Labor Board during the war and has since been connected with private economic 
research activities, has been appointed as editor of the Monthly Labor Review. He 

succeeds Mr. H. L. Amiss, who died September 18, 1925. 


The American Historical Association Endowment Campaign.—The American 
Historical Association is making a general appeal to the public for coéperation in 
raising an adequate endowment. The national character of the Association has been 
recognized by the Federal Government. It was chartered by Congress in 1889 and 
its annual reports, presented through the Smithsonian Institution, are included in the 



































i> 1 tr ar 2 od 





- © 


—- SS SS 


te a ee eG 





Notes 227 













































109] 


series of Congressional documents. The service rendered has also been national. 
By the terms of its charter the Association is made responsible for the promotion of 
“American history, and of history in America.”’ It is engaged in direct public service 
through the Public Archives Commission, with its valuable reports describing the 
archives of the states. Further, the Association undertakes to place at the disposal 
of the Government the resources of historical scholarship. Another contribution to 
a better understanding of American history has been made through the Historical 
Manuscripts Commission which is constantly locating valuable manuscripts, espe- 
cially those now in private hands. Such work as the Association has already done, 
has been made possible only through an immense amount of unpaid service. It is not 
possible, or even desirable, to appeal to Congress for much assistance from the Federal 
Treasury beyond what is now received through the publication of the Annual Reports. 

What the Association now asks is an increase in endowment from $50,000 to 
$1,000,000, with the expectation that the additional income thus provided will be 
used, not only to secure more certain and adequate support for work already under- 
taken, some of which has been seriously curtailed or delayed by lack of such support; 
but also to make possible certain vital new forms of service. The Office of the 
Executive Secretary of the Committee on Endowment is at 110 Library, Columbia 
University, New York City. 


Increasing the Prosperity of the New England States.—The regional consciousness 
of the six New England states as a unit began to find expression in the formation in 
November, 1925, of an All New England Conference. At a large and impressive 
meeting held at Worcester at that time, a permanent organization was created for 
the purpose of dealing with problems common to all of the six New England states, 
such as water-power, marketing, industrial efficiency and progress, and agriculture. 

By the form of organization which was adopted at the Worcester meeting for the 
permanent Council, it was to consist of twelve members from each of the six states, 
these members being selected by the delegates from each of the states present at the 
general meeting in Worcester and at a similar annual general meeting to be held 
thereafter. The delegates to the general meeting were to be representatives of 
various organizations such as agricultural, industrial and financial organizations, 
Chambers of Commerce, etc. 

The Worcester meeting and the addresses there delivered attracted attention and 
interested comment throughout New England, and the movement began under the 
most favorable auspices. Two meetings of the permanent Council have been held 
since its organization—one at Providence in December and a second at Portland on 
the 26th of March. A permanent organization is maintained, with headquarters in 
Boston, a budget of fifty thousand doilars was established, raised by assigning a 
quota to each state, and systematic, thoughtful effort is being expended upon the 
principal basic problems which relate to New England progress and prosperity. One 
of the most important results for the present year will be a careful survey of certain 
phases of industrial conditions, with special reference to products and marketing, to 
be made in connection with the Department of Commerce survey of New England 
about to begin this spring. 

While movements of this kind are necessarily general, there is good reason to 
believe that the earnest and unselfish efforts of those connected with this enterprise 
known as the New England Council will produce results that will be of real service 
to the six states. This is a period in which there is much self-searching in progress 
in the different states of the Union, and for the first time states are beginning to 
compete systematically against each other in well-ordered, businesslike effort to 





















American Statistical Association [110 





228 


advertise and push their own attractions and advantages. Hence the New England 
states, which form, in the opinion of the rest of the country, a definite geographic area 
whose boundaries are almost as definite as those of a single state, are wise in taking up 
this task as a unit rather than depending entirely upon separate action. 

This movement has practical interest for the members of the American Statistical 
Association, because much of its activities will be based on statistical and research 
work. 

The headquarters of the New England Council are established at 201 Devonshire 
Street, Boston. John S. Lawrence, a prominent Massachusetts manufacturer, is 
President, and Dudley Harmon, Permanent Secretary. 


The First American Health Congress.—A First American Health Congress was 
held during the week May 17 to 22 at Atlantic City. Addresses were delivered by 
Sir Arthur Newsholme, Professor C.-E. A. Winslow, and Dr. George E. Vincent, as 
well as many other renowned speakers. Those interested in the details of the meeting 
should address Mr. Homer N. Calver, General Director, American Health Congress, 
370 Seventh Avenue, New York City. 


Progress of Statistical Plan for the Study and Prevention of Public Accidents.— 
The Uniform Plan for reporting and tabulating public accidents in the United States, 
as prepared originally by the Committee on Public Accident Statistics of the National 
Safety Council, is meeting with general support from state and city officials. The 
statistics published currently by the Motor Vehicle Bureau of the State of New York 
are prepared in general accordance with the National Safety Council plan. 

The State of Rhode Island has recently begun to prepare public accident statistics 
for that area and is reporting the same to the National Safety Council at Chicago. 
More than forty cities are also coéperating in this effort. It is hoped during 1926 to 
add to the number of states and cities represented in this service. 

For the use of statisticians in state and municipal services, the Committee on 
Public Accident Statistics is preparing a Manual of definitions for classification 
purposes. This Manual will also include directions for the tabular and other pres- 
entation of the results. It is hoped that the Manual will be of use to police officials 
and to state commissioners in charge of the administration of the motor vehicle laws. 
Information regarding the statistical system, and copies of the forms and schedules, 
may be obtained from Mr. 8S. J. Williams, Chief, Public Safety Division, National 
Safety Council, 108 East Ohio Street, Chicago, Illinois. 


Winners of Chicago Trust Prizes.—Ralph E. Heilman, Dean, Northwestern 
University School of Commerce, as chairman of a Committee of Award has announced 
the winners of the Chicago Trust Company prizes for original research in business and 
finance. 

The first prize of $300 was won by William Alexander Grimes of Catonsville, 
Maryland, whose subject was, ‘Financing Automobile Sales by the Time Payment 
Plan.” The second prize of $200 was won by Gerald M. Francis, of Urbana, Illinois, 
whose essay treated of, ‘Financial Management of Farmers’ Elevator Companies.” 

In addition to these annual prizes, there is a triennial research prize of $2,500. 
This prize will be awarded every three years for an unpublished study which is 
submitted in competition and which is considered to contain the greatest original 
contribution to knowledge and advancement in the field outlined. The award will 
be made in the autumn of 1927. Papers due not later than June 1, 1927. No 
restrictions are made as to eligibility of contestants for this prize. The donors have 










be oe che Sab 











i ee i ee” | 





Ie + 


BITS 


SOT RN nn ee 


Pin y SE the dete RAAT, hla Sanne Rena Am el A Rl hee 





111) Notes 229 
in mind particularly officers of banks, business executives, practicing attorneys, 
members of teaching staffs, and advanced graduate students in the field of economics 
and business. The papers must be written in the English language. 

Further information may be secured from Professor Leverett S. Lyon, Robert 
Brookings Graduate School of Economics and Government, Washington, D. C. 


A census of the population and agriculture of the three Prairie Provinces of Canada 
—Manitoba, Saskatchewan, and Alberta—will be taken by the Dominion Bureau of 
Statistics as of date June 1, 1926. The taking of this census is prescribed in the 
various Acts of the Dominion Parliament constituting these provinces, the raison 
d’étre being the more rapid growth of Western Canada and the important financial 
and economic considerations depending thereon. The census will be practically 
identical in scope with the decennial census of the whole Dominion taken in 1921. 


Official announcement has been made of the retirement of Sir George H. Knibbs 
from the Directorship of the Institute of Science and Industry of the Commonwealth 
of Australia. As yet no information has been afforded as to who will be designated 
his successor. 


Professor William F. Ogburn has resigned the editorship of this JourNAL and is 
continuing his stay in France, pursuing research on social and economic conditions in 
that country. 


Mr. George Bassett Roberts, for some years Manager of the Reports Department 
of the Federal Reserve Bank of New York, has resigned to become Statistician of the 
National City Bank, of which his father, George E. Roberts, is Vice-President. 


COMMITTEES, 1926 


Representatives on the Social Science Research Council 
Term Expires 


Walter Fr. Wilheok. . . . 5. nc cece cccccccccccscs.- deeember Si, 1685 
FO Eee lll 
Edmund E. Day....... Pree Te TET ll 


Representatives on the Advisory Committee on the Census 
Term Expires 


Robert E. Chaddock...........................December 31, 1926 
OO ee hhh 
I lll 


Representative on the Board of Directors of the National Bureau of Economic Research 
Malcolm C. Rorty 


Representatives on the Joint Committee on Standards for Graphics 

Karl G. Karsten, Chairman Howard W. Green 

Irving Fisher Arthur H. Richardson 
Representatives on the Joint Committee on the Encyclopedia of the Social Sciences 


Mary Van Kleeck, Chairman 
William F. Ogburn R. H. Coats 


























230 American Statistical Association 


Committee on Fellows 


Edmund E. Day, Chairman....................December 31, 1926 
i dé. etaancine satveiuesesennwd es December 31, 1927 
ced cka sc gesk peaewenssicunen ues December 31, 1928 
Louis I. Dublin..........................+....-December 31, 1929 
Es ow cGivin cee ekiiviveneealawened December 31, 1930 


Committee on Nominations 


Walter W. Stewart, Chairman 


Donald R. Belcher Fred G. Tryon 
Committee on Institutional Statistics 
Horatio M. Pollock, Chairman Edith M. Furbush 
Kate H. Claghorn Joseph A. Hill 
Committee on Governmental Labor Statistics 
Mary Van Kleeck, Chairman J. F. Dewhurst 
A. J. Altmeyer Leonard W. Hatch 
Chas. E. Baldwin Ralph G. Hurlin 
Joseph A. Becker Richard Lansburg 
W. A. Berridge Don G. Lescohier 
Louis Bloch Eugene B. Patton 
W. R. Burgess Roswell F. Phelps 
R. D. Cahn Woodlief Thomas 
R. H. Coats F. G. Tryon 
Frederick E. Croxton Howard H. Ward 


Commiitee on Finance 
Edward E. Lincoln, Chairman 


Frederick R. Macaulay Williford I. King 
Membership Committee 

Edwin W. Kopf, Chairman A. H. Mowbray 

W. Randolph Burgess Jessamine S. Whitney 


Committee on Place of Annual Meeting in 1926 
Leonard P. Ayres 


Committee on Advertising 
Paul T. Cherington Earle Clark 
S. O. Martin 
Committee on Social Statistics 
Ralph G. Hurlin, Chairman 


George Bedinger Emil Frankel 
William A. Berridge Hornell Hart 
F. Stuart Chapin Philip Klein 


Neva Deardorff Linton B. Swift 





Term Expires 





[112 














ete ee ee 





err 


AE ict ci ale linke case 


igre) eh | ane BASE eee eter 


aera 
3 
Bava Ae 





VLSI ariel eaiAKT, 











113] Notes 231 


MEMBERS ADDED SINCE MARCH 1926 


Aylward, Arthur William, Assistant Manager, Cheney Brothers, New York, N. Y. 

Barnum, C. L., American Radiator Company, New York, N. Y. 

Benjamin, (Miss) E. Gail, Statistical Department of J. Henry Schroeder Banking 
Corporation, New York, N. Y. 

Birks, Henry G., Henry Birks and Sons, Ltd. (Jewelers), Montreal, Quebec, Canada 

Black, James E., Economic Analyst, Bureau of Mines, Washington, D. C. 

Bresciani-Turroni, Constantino, Professor of Political Economy at the University 
of Bologna, Bologna, Italy 

Brown, Homer S., Western Electric Company, 195 Broadway, New York, N. Y. 

Bruére, Dr. Henry, Metropolitan Life Insurance Company, New York, N. Y. 

Burmeister, C. S., Bureau of Agricultural Economics, Washington, D. C. 

Butler, Elizabeth R., Institute of Religious Research, Teachers College, Columbia 
University, New York, N. Y. 

Cantrell, Georgia E., United States Department of Agriculture, Washington, D. C. 

Cater, Marion B., Bell Telephone Laboratories, New York, N. Y. 

Cavanaugh, Frank J., Commodity Statistics, National Bank of Commerce, New York, 
N. Y. 

Charles, Oscar H., Division Superintendent of Schools, Bureau of Education, Manila, 
Philippine Islands 

Clark, F. Haven, General Analysis of Securities, Scudder, Stevens and Clark, Boston, 
Mass. 

Coan, William, Washington and Lee University, Lexington, Va. 

Colgan, Virginia E., Association of Life Insurance Presidents, New York, N. Y. 

Cotins, Ira S., Assistant Economist of R. H. Macy and Company, New York, N. Y. 

Crosby, G. R., Columbia University, New York, N. Y. 

Crump, Norman E., Financial Times, London, England 

Dodge, Edward L., General Auditor, Federal Reserve Bank of New York, New York, 
N. Y. 

Dove, Wallace B. K., Director Sales and Production Research, United Business 
Service, Cambridge, Mass. 

Davis, Frank V., Statistician, Ohio Bureau of Coal Statistics, Columbus, Ohio 

Dennis, Wendell M., Market Analysis, American Cyanamid Company, 511 Fifth 
Avenue, New York, N. Y. 

Dickerman, Emily E., General Insurance Statistics, Phoenix Mutual Life Insurance 
Company, Hartford, Conn. 

Dredger, C. M., Standard Oil Company of New York, New York, N. Y. 

Dutcher, Professor Dean, Pennsylvania State Forest School, Mont Alto, Pa. 

East, John D., American Telephone and Telegraph Company, New York, N. Y. 

Eells, Professor W. C., Applied Mathematics, Whitman College, Walla Walla, Wash. 

Epstein, Lillian, Student at Barnard College, New York, N. Y. 

Evans, John Joseph, Jr., Forecast of Business Conditions, Armstrong Cork Company, 
Lancaster, Pa. 

Ewing, P. V., Sears Roebuck Agricultural Foundation, Chicago, III. 

Finn (Mrs.) Ethel E., Statistical Assistant, United States Bureau of Mines, Wash- 
ington, D. C. 

Fisher, Ernest M., Director of Education and Research, National Association of 
Real Estate Boards, Chicago, II. 

Foote, Myrtle, Bureau of Statistics, Interstate Commerce Commission, Washington, 

D. C. 



































American Statistical Association [114 





232 


Friedmann, G., Sales Promotion, Great Atlantic and Pacific Tea Company, New 
York, N. Y. 
Fuller, Carlton P., Western Elecric Company, New York, N. Y. 
Gasteiger, E. L., Bureau of Statistics, Bureau of Agriculture, Harrisburg, Pa. 
Gostomski, Adam B., Economics Department, Moody’s Investors Service, New York, 
N. Y. 
Gray, E. R., Section of Statistics, Treasury Department, Washington, D. C. 
Griggs, Julia, Council of Social Agencies and Community Fund of Columbus and 
Franklin Counties, Columbus, Ohio 
Halbert, Keet W., American Telephone and Telegraph Company, New York, N. Y. 
Hager, (Mrs.) Margaret F., Jefferson Clinic Auxiliary, New York, N. Y. 
Halboth, Henry C., Public Accounting, Hart Brothers, Drake and Company, New 
York, N. Y. 
Hale, Sydney A., Editor, Coal Age, New York, N. Y. 
Hansell, Sven B., New York Telephone Company, New York, N. Y. 
Hawkins, George E., Turner S. Underwood, C. P. A., Pueblo, Colo. 
Harburger, Philip S., Student at Columbia University, New York, N. Y. 
Hasse, Adelaide R., Institute of Economics, Washington, D. C. 
Hemenway, Ruth A., Russell Sage Foundation, New York, N. Y. 
Heyboer, Adrian M., Employment and Cost Statistics, The Joseph and Feiss Com- 
pany, Cleveland, Ohio 
Hill, Daniel A., Engineering Department, The Ohio Public Service Company, Cleve- 
land, Ohio 
Hillyer, D. V., Port of New York Authority, Highway Traffic Analysis, 11 Broadway, 
New York, N. Y. 
Hinrichs, Arnold F., Graduate Student at University of Minnesota, St. Paul, Minn. 
Hoffman, Hilda M., Metropolitan Life Insurance Company, New York, N. Y. 
Hogg, Margaret Hope, Instructor in Economics, Smith College, Northampton, Mass. 
Holland, Herbert H., Port of New York Authority, New York, N. Y. 
Ingraham, Mark H., Assistant Professor of Mathematics, University of Wisconsin, 
Madison, Wis. 
Keppel, Dr. Frederick Paul, Carnegie Corporation of New York, 522 Fifth Avenue, 
New York, N. Y. 
King, Robert B., Illinois Bell Telephone Company, Chicago, III. 
Kotthoff, D. H., Southwestern Bell Telephone Company, St. Louis, Mo. 
Kromer, Oswald C., The Joseph and Feiss Company, Cleveland, Ohio 
Kuvin, Leonard, Kardex Institute of Business Management, New Haven, Conn. 
Lansburgh, R. H., Secretary of Labor and Industry, State of Pennsylvania, Harris- 
burg, Pa. 
Lawrence, Gordon, Corona Typewriter Company, Groton, N. Y. 
Leavens, Professor Dickson H., Yale in China, Changsha, China 
MacFadden, Dr. Frederick A. R., Barrow, Wade, Guthrie & Company, Public 
Accountants, New York, N. Y. 
McCarthy, D. J., Operating Statistics, Great Atlantic and Pacific Tea Company, New 
York, N. Y. 
McDermott, Frederick J., State Insurance Fund, New York, N. Y. 
McDonough, Miss A. J., Statistical Bureau, The Halle Brothers Company, Cleveland, 
Ohio 
McLendon, Idus Rowe, Standard Statistics Company, New York, N. Y. 
Michener, Dr. Anna M., National Bank of Commerce, New York, N. Y. 








en 











14 


ow 








een Ma eae 


ey 








OS ¢2 20 Deven Ante 


; 





115] Notes 233 





Molina, Edward C., Development and Research, American Telephone and Telegraph 
Company, New York, N. Y. 

Mowrer, Dr. Ernest R., Social Science Research Council, Chicago, Ill. 

Naess, Ragnar D., Report Department, Federal Reserve Bank, New York, N. Y. 

Nash, A. G., Student at University of Wisconsin, Madison, Wis. 

Olmstead, Dr. Paul S., Bell Telephone Laboratories, New York, N. Y. 

Parkhurst, Elizabeth, Division of Vital Statistics, State Department of Health, 
Albany, N. Y. 

Rhoads, Professor Joseph H., Northern Normal and Industrial School, Aberdeen, S. D. 

Rose, Luther D., Statistics Section, Port of New York Authority, New York, N. Y. 

Sarle, Charles F., Agricultural Statistician, Bureau of Agricultural Economics, Divi- 
sion of Crop Estimates, Washington, D. C. 

Schwobel, E. F., Public Utilities Underwriting House, New York, N. Y. 

Scott, T. W., Economic Research, Metropolitan Life Insurance Company, New York, 
N. Y. 

Simmons, Professor C. D., Statistician to President of the University of Texas, Austin, 
Tex. 

Small, P. H., Investment Securities, Schwabacher and Company, San Francisco, Cal. 

Smith, Dr. Frederick M., Reorganized Church of Jesus Christ of Latter Day Saints, 
Kansas City, Mo. 

Smith, Philip W., Manager, Statistical Department, Fuller Brush Company, Hartford, 
Conn. 

Stevens, John Randolph, 161 Lenox Avenue, Providence, R. I. 

Sturges, Professor H. A., Teaching and Research, Washburn College, Topeka, Kans. 

Tagliacarne, Dr. Guglielmo, Vice-Secretary General and Director of Bureau of Sta- 
tistics, Chamber of Commerce, Milan, Italy 

Taylor, James S., Division of Building and Housing, Department of Commerce, Wash- 
ington, D. C. 

Toolan, Cyprian A., Theodore Prince and Company, New York, N. Y. 

Updike, Harold W., Investigation, Equitable Trust Company, New York, N. Y. 

Van Newland, Roscoe, 108 West 90 Street, New York, N. Y. 

Waldron, James Lawrence, Technical Assistant to Statistician, Brooklyn Edison 
Company, Pearl and Willoughby Streets, Brooklyn, N. Y. 

Walker, Dr. Francis, Federal Trade Commission, Washington, D. C. 

Warner, Guy W., Student at School of Commerce, University of Denver, Denver, 
Colo. 

Wilkinson, Edward C., Bond Sales Manager, 5 Nassau Street, New York, N. Y. 

Wissler, Frank E., General Petroleum Corporation, Portland, Ore. 

Young, Dr. E. C., Agricultural Statistics, Purdue University, Lafayette, Ind. 











American Statistical Association 


REVIEWS 


Wheat Studies of the Food Research Institute. Vol. 1, Numbers 1-10, Decem- 
ber 1924-September 1925. Stanford University, Press. viii, 375 pp. 

“In the realm of wheat no two years are alike; each has its peculiarities.” 
With this sentence the Food Research Institute of Stanford University begins its 
series of wheat studies. It is this characteristic of wheat production and the 
movement of its price that has baffled the wheat farmer, the grain merchant, the 
miller, the publicist, the economist, and even the speculator, since international 
trade in wheat was begun. The series of studies under review, according to the 
Institute, “is designed to give a sound, impartial review of the world wheat 
position and outlook, . . . with due recognition to economic conditions in ex- 
porting and importing countries.” Each of the above mentioned classes should 
welcome a publication whose avowed object is to throw more light on some of the 
unsolved economic problems connected with the production, international trade, 
and prices of one of the world’s important cereal crops. 

The first volume of these studies, December 1924—September 1925, has been 
completed and it is now possible to indicate the method of treatment and appraise 
some of the results. The series of studies (ten in number) may be divided into 
two groups: One group dealing with the developments in the current wheat 
situation, the other with fundamental economic studies of a more permanent 
nature. Thus the first volume consisted of four numbers on the changing posi- 
tion of wheat during 1923-24 and 1924-25, and six numbers discussing such 
questions as (1) “The Dispensability of a Wheat Surplus in the United States,” 
(2) “Costs of Wheat Production in the North American Spring Wheat Area,” 
(3) “European Wheat Production as Affecting Import Requirements,” and 
(4) “Wheat Production and Export from Canada.” 

The numbers dealing with current developments in the wheat situation will 
probably command the most attention. Here the authors present a detailed 
analysis, flanked with many tables and charts, of the wheat crops of various 
countries, the so-called import “requirements,” the export surpluses, visible 
supplies, and other factors which seem to influence the course of wheat prices. 
The wheat situation for each crop year is pictured as a balance between the 
“effective requirements” of importing countries and the “genuine surpluses”’ of 
exporting countries.. Tables are given showing the amount of wheat that will 
probably be exported by such surplus countries as Canada, United States, 
Argentina, Australia, India, etc., together with the needs of importing countries 
—principally those of Northwestern Europe. 

Each country is analyzed in detail. In the importing countries allowance is 
made for probable carry-over of old wheat, production of substitute crops, such 
as rye and potatoes, and internal economic position. In exporting countries the 
surplus is determined by a consideration of home requirements, probable varia- 
tions in the carry-over, the quality of the new crop, and other factors peculiar to 
the country. In no other publication is there gathered together so much per- 
tinent information on the wheat outlook for each country. 
































adliees Repeat a: 


en 3 Atel 2 rime i aa 


St eaten 








16 


aw~z 


vee @® 











PERALTA ar bata LAGS LENE IL LE cok ead 








117] Reviews 235 





The summary tables of probable importers’ ‘‘requirements” and exporters’ 
“surpluses” apparently form the groundwork of the price analysis. It is stated 
in No. 10, p. 352, “as we have frequently had occasion to emphasize, the margin 
between genuine exportable surpluses and importers’ effective requirements is 
far more important in the wheat market than changes in the size of the world’s 
cropasawhole.”’ (See also No. 5, p. 150-51, and No.7, p.211.) When the ratio 
of export surplus to import requirement is high the world level of price should be 
low, and vice versa. In addition separate consideration is given to the probable 
movement in price in the United States. The analysis here is most complete. 
While the public thinks of wheat in terms of Chicago prices (principally future 
prices), the Institute emphasizes the fact that within the country prices must be 
studied by classes of wheat if one wishes to form an intelligent opinion of the 
wheat situation. While all classes of wheat are inter-related, the market for 
each class has its own peculiarities and its own price. It is possible for this coun- 
try to import one class of wheat over a high tariff wall, and, at the same time, 
export another class in competition with other exporting countries. The con- 
clusions on price movements in the United States are based upon a consideration 
of the relative scarcity or abundance of these various classes of wheat as well as 
upon the probable world’s level of price. 

The Institute well recognizes that definite price prediction in the case of a 
commodity like wheat would be a gratuitous undertaking. In every case where 
the price outlook is discussed, emphasis is placed upon the fact that prices are 
continually under the influence of forces whose strength it is not possible ac- 
curately to gauge at the moment when they may be exerting major force. Frosts 
in Argentina, drought in Canada, or floods in Europe are quickly reflected in 
price movements in advance of any accurate estimates of the extent of crop 
deterioration. The presence or absence of speculation on the part of the public 
in wheat may account for temporary price movements out of all proportion to 
actual developments. With these as well as other reservations the price outlook 
is analyzed and modest conclusions stated as to the level of price to be anticipated. 

The price analysis does not proceed very far in the direction of a quantitative 
statement of the effects of supply and demand changes. It is conceded by the 
authors that a movement in the general level of prices has a correlative effect 
on the price of wheat, but no attempt is made to judge quantitatively the relation 
between these two variables. It is also stated that the supply of rye has an 
influence on wheat prices, but in no place is there any statement as to the past 
effect of the supply of rye on the price of wheat. Given a 10 per cent increase in 
the supply of rye, how much on the average should the wheat price be affected, 
other things being equal? The assertions that the balance between export sur- 
pluses and import requirements is the chief price-making factor, also might well 
be called into question by the student of prices even though this method has the 
sanction of precedent of long standing. The first objection to the method lies 
in the fact that there are so many subjective elements in it. The use of the terms 
“genuine” and “effective” and “requirements” suggests that we may have as 
many estimates as there are students of the problem. Secondly, if it is the mar- 
gin between export surplus and import requirement that is the most important 















































236 American Statistical Association [118 





factor in price determination, why is it that we find in No. 10, p. 351 (September 
1925) this statement: 


In the light of these various influences, so far as they can be appraised at pres- 
ent, Broomhall’s estimate of aggregate European import requirements appears 
conservative, and perhaps Sir James Wilson’s higher estimate as well. But much 
will depend upon world wheat prices, and particularly upon exports from Canada 
and Russia. Should! import prices decline considerably, and next year’s crops 
promise badly, European umports may exceed Broomhall’s suggested figure by con- 
siderable amounts. 


Doesn’t this statement mean that the Institute’s summary tables of “‘importers’ 
effective requirements” are drawn up with some assumption already made with 
respect to price?? “Importers’ effective requirements’ cannot be both price 
determined and price determining. It would seem from a perusal of the studies 
that the authors were governed by many other considerations than the margin 
between the export surplus and the import requirements. 

The Institute should be commended generally by students of prices for as- 
sembling a wealth of statistical material on every phase of the wheat trade. In 
the presentation of the material little attempt has been made to apply mathemat- 
ical or statistical methods to the data. For instance, in comparing yields, pro- 
duction, and export of wheat by countries, the present year’s figures are compared 
with last year and a five-year pre-war average. The latter average is usually 
interesting but probably of little value in helping to interpret present data on 
wheat yields and production in such countries as Canada or Australia, where 
acreage trend has been sharply upward and the trend of yield per acre has been 
upward in one country and downward in the other. Long run trends which 
offend neither mathematics nor logic might be more useful in pointing out the 
direction in which we seem to be traveling at the present time than a com- 
parison of the present with a particular five-year average (1909-13). 

The second group of studies which are concerned with the broad aspects of 
wheat production should be of particular interest to publicists, economists, and 
people in the public service. The studies cover the economics of wheat produc- 
tion in the United States and competing countries and should be valuable to 
those who wish to understand the problems and outlook confronting the wheat 
raiser in the different wheat areas of our country. 

The wheat surplus problem of the United States is the subject of one of these 
studies, and although it was published in March 1925, it presents an analysis of 
a question that is of even greater importance today—namely, the handling of our 
surplus crops. The conclusion is stated that (contrary to belief in many quar- 
ters) it is not necessary to produce a surplus of wheat in the United States; that 
it is not needed as an insurance against famine; and that it would be possible so 
to adjust the acreage that it would produce on the average a crop only sufficient 
for home needs. This country was approaching that situation before the war 
and the opinion is expressed that this movement will continue as soon as war 
time influences are no longer felt. 


1 Italics are the reviewer's. 
2See No. 3, p. 87, February 1925. 
















118 


aber 


res- 
ears 
uch 
ada 
Pops 
con- 


ers’ 
ith 
rice 
lies 
gin 


SS Str ee 


Sa NRE HS LO SAE Sage Poa ns wn an iN aeaN 


Yea ee a 


AP Nee SORA 








119] Reviews 237 


It is doubtful if this simple solution will meet with agreement among those 
who represent the wheat farmer, but those interested in the problem would do 
well to read this particular bulletin for one viewpoint—possibly the consumer’s 
viewpoint—on this important subject. 

HiLDING ANDERSON 


New York City 


Mathematics of Life Insurance, by L. Wayland Dowling, Ph. D. New York: 
McGraw-Hill Book Company. 1925. x, 121 pp. 

The first eight pages of this book review briefly the theory of compound in- 
terest, developing many of the usual formulae in connection with annuities. 
The author’s methods in these pages should prove stimulating to the student who 
is already familiar with the subject matter, but would doubtless prove very 
difficult for a beginner. The author points out in the preface that a knowledge 
of the applications of algebra to the theory of annuities is assumed. On page 3 
under the topic “ Annuities’’ we find the following: 

An annuity is a sum of money payable at stated intervals of time and for a 
period of years. Usually an annuity is payable annually and at the end of each 
year. If the annuity is payable at the beginning of each year, it is called an 
annuity-due. 

It is felt that the definition given in the first sentence can be improved upon. 
The accuracy of the second sentence is questioned. The third sentence appears 
as a definition of an annuity-due and limits it to one payable annually. The 
expression “contingent annuity,” although it may not be generally used in 
American texts, serves a real need and its use or the use of some other words with 
the same definition should be encouraged. It is not clear that the expression 
“term annuity’”’ fills a real need. 

After Chapter II on the “‘Theory of Probability,” life insurance proper is 
introduced in Chapter III with a discussion of the mortality table. The first 
sentence of the chapter seems unfortunate: “‘The probability of living, or dying, 
at a given age is measured by a mortality table.’’ It would seem rather that the 
mortality table is the result of a statistical effort to find out what the actual 
probability of living or dying is at different ages. Of course, for practical 
purposes, in conducting a life insurance business, it is necessary in many cal- 
culations to assume that deaths will be related in a definite manner to those 
expected by a chosen mortality table. The second sentence of this chapter 
reads as follows: 


The fundamental column in a mortality table exhibits the number of people 
left alive at each age out of an assumed number alive at a given early age until 
all are dead. 


In the minds of many there is little doubt that the fundamental column of 
& mortality table is the column showing the probability of death at each age. 
However, this is not a matter of great importance since either column can be 
obtained from the other. 

On page 30 appears the statement: 


If we assume that the deaths in a stationary community are distributed 

















238 American Statistical Association [120 





uniformly throughout each year, it is clear that the number of persons arriving at 
age x is constantly the same; that is, there will be always lz persons alive at age z. 


It would seem that the author has neglected some of the assumptions necessary 
to enable him to draw the conclusion that there will always be /z persons alive at 
age 2. 

On page 32 under the topic “Expectation of Life” the author fails to point 
out in his first paragraph and in his definition that he is talking about whole 
years of life, the fractions of years of life lived in the years of death being omitted 
in calculating the expectation of life. 

It would seem that an error has been made in defining “deferred annuities’ 
on page 41, where the statement is made that: 


’ 


The symbol n| indicates that the annuity does not become payable until after 
the expiration of n years; that is, the first payment falls due respectively at the 
end or at the beginning of the nth year. 

Usually the symbol n| indicates that the annuity period does not begin for n 
years and the first payment of such an annuity payable annually will be n+1 
years hence for an annuity-immediate and n years hence for an annuity-due. 
The notation shows that the author has the orthodox definition in mind and that 
he has merely misstated the meaning in the sentence quoted above. The author 
avoids the use of the term “annuity-immediate,’’ but introduces no term to 
take its place. 

On page 50 is the statement: ‘‘ We seek, then, to find a net level annual pre- 
mium that shall be admittedly too great during the younger ages and too small 
during the older ages.”” It seems quite doubtful if this conveys the author’s 
meaning. At any rate the sentence is not written with sufficient care to avoid 
misunderstanding on the part of the student. It is unfortunate that the author 
has departed from the standard notation for temporary insurance and endow- 
ment insurance. This will doubtless cause confusion for the student when he 
reads the standard text books. 

On page 62 we find the incomplete statement: “In words, the value of a policy 
must be the difference between what (x) would pay at age x+n and what he 
does pay.” This sentence will most certainly be confusing to the student. On 
page 63, in explaining the retrospective method of valuation, the following sen- 
tence is given in italics: 

The value of a policy at any time is equal to the accumulated value of the 
premiums paid since the issue of the policy less the net single premium for a 
term insurance covering the time elapsed since the issue of the policy. 

This statement is, of course, not true for any ordinary interpretation of the term 
“net single premium.’”’ In the equation following this statement, the author 
equates values as of the date of issue of the policy. Preliminary to doing so he 
states that: “The value of nVz at the time of issue of the policy is nEx. nVz.” 
This is not just what he means. The value of nVz at any time is nVz. The 
author uses the symbol nVz alike for continuous payment life policies and for 
limited payment life policies. 

On page 70 the author defines as ‘‘net reserves’”’ what are usually called “net 
level premium reserves.” This is especially unfortunate since he does it in 




















































Te ee 


asa ae 











[120 


ng at 
ze x, 
sary 
re at 


oint 
hole 
tted 


les 


fter 
the 


rn 
+1 
lue. 
hat 
hor 

to 


re- 
all 
r’s 
oid 
10r 
W- 








121] Reviews 239 





introducing the notion of preliminary term reserves, giving the impression 
(correct for his definitions) that a preliminary term reserve is lower than the 
corresponding net reserve. After explaining the meaning of full preliminary term 
reserves the author makes the erroneous statement that: “This plan does not, of 
course, effect the actual net premium charged to (x), whichis Pz.”” Incidentally 
the word “affect”’ is evidently intended where the word “effect” is used. 

In the chapter on “‘ Policy Options” the author fails to distinguish between the 
reserve of a policy and the guaranteed policy values, thus ignoring entirely the 
idea of a surrender charge. 

This book covers in seventy-nine pages most of the material which is essential 
as a basis for ordinary actuarial work. The reviewer feels that the book would 
be far more useful as a text if it covered twice as many pages. Most students 
will make more rapid progress with a lucid text written with the thought of an- 
ticipating the student’s difficulties and clearing up as many of them as possible in 
the text. It seems that the author has, in places, deliberately chosen the hardest 
method of development for a student to see instead of the most natural method. 
For instance, the formula for the single premium for a whole life insurance is 
developed by a method which is probably very difficult for most students going 
through the subject for the first time, and then the more natural development is 
given on the following page. Throughout the book the author loses no chance to 
use a summation sign. This makes for compactness, and probably for disgust on 
the part of the student. 

The four pages covering the whole subject of preliminary term valuation seem 
to be absolutely hopeless from the standpoint of all but the very exceptional 
student and even the best student must supplement from some other source what 
is in the text before he can hope to grasp preliminary term valuation in such a way 
that it will be of any practical use to him. 

There is little danger of writing a text on this subject which will be too simple 
for the good of the student. A text written from the standpoint of the student 
to whom the subject is new and who is having his difficulties in grasping even 
those points which will seem so simple later, such a text, designed to inspire 
the student rather than to discourage him, is of many times the value of a text 
written in a compact form, with every sentence expressed as generally as possible 
with the apparent intention of making it a challenge for the student to understand 
what it is all about. The compact text costs less but if a student’s time is worth 
anything the lucid text is worth many times as much as the compact. 

RarNnarD B. RosBins 


The Case of Bituminous Coal, by Walton H. Hamilton and Helen R. Wright, with 
the aid of the Council and Staff of the Institute of Economics. New York: 
The Macmillan Company. 1925. 307 pp. 

What the Coal Commission Found, edited by Edward Eyre Hunt, F. G. Tryon, and 
Joseph H. Willits, with a Foreword by John Hays Hammond. Baltimore: 
The Williams and Wilkins Company. 1925. 411 pp. 

The chaotic condition of the coal industry in recent years has been productive 
of a considerable body of remedial literature. Most of it, becauseit has evidenced 


























240 American Statistical Association 





[122 


no great knowledge of the actual conditions of the industry, has been of little 
value. Thetwobooksunderreviewarenotinthatclass. Though neither is likely 
to prove to be the pillar of flame to lead the coal industry out of the wilderness 
into the promised land, both are valuable contributions to the economics of coal. 
Both are based on careful and scholarly studies of the problems of the industry. 

The authors of The Case of Bituminous Coal have caught the spirit of the in- 
dustry in their style of presentation. They open with the line, ‘ Here’s a pretty 
mess,” from W. 8. Gilbert, and at the head of each chapter is a stanza from one of 
the Gilbert and Sullivan operas or from the Bab Ballads. And there the touch of 
comedy stops. Within the pages of the chapters there is only a searching and 
revealing analysis of the hopelessnessand the tragedy of the industry. It is a tell- 
ing contrast, for the industry holds much of comedy and of tragedy. Its aimless- 
ness, its array of heroes and villains, their bombastic exchanges are all comic until 
oneappreciates their significance and theirresults. Thentheybecometragic. The 
recent settlement of the anthracite strike has furnished one of the most hopeless 
and at the same time one of the most humorous situations in the history of the 
industry. Inthefirst placeit wasa strike thatshould never havebeencalled. After 
many futile months, it was brought to a close through a settlement by exhaustion, 
a settlement that solved none of the problems of the industry and that promises 
little for the future. And yet, hollow as is the honor, the claimants for the hero’s 
role are legion. The strike was settled, it seems, by a President who uttered nota 
word, by some dozen Senators who uttered many, by several Governors who hope 
to become Senators, and by a Cabinet member who, for a time, at least, had hopes 
of becoming a Governor, not to mention the many leading lights among the 
miners and the operators. 

Professor Hamilton and Miss Wright have selected as their particular point of 
attack the competitive system as it has functioned in the coal industry. Aftera 
brief estimate of the importance of the industry, they present in a chapter on the 
Competitive Ideal all the orthodox arguments that are usually advanced to prove 
that free competition is all that is needed to hold industry to an orderly develop- 
ment, with the maximum benefits to the owners, the workers, and the consumers. 
In the next chapter, they point out that in the coal industry, at least, the ideal of 
an industry controlled solely by competition is only partially realized. The cor- 
porate control of the business, the ‘‘cake of custom,” the tendency toward com- 
bination within the industry, slight as it is, and the control from outside agencies 
have all combined to hinder the realization of the advantages claimed for free 
competition. | 

Under the chapter head ‘‘The Spotted Actuality,”’ there follows an analysis of 
what competition has really accomplished. Itisadamning case. In place of the 
blessings promised, competition has brought chaos and disorder. Instead of pro- 
ducing a mine capacity closely approximating the demand for coal, it has been 
responsible for an overdevelopment of almost one hundred percent. Tothatover- 
development may be traced most of the ills of the industry. The authors do not 
emphasize sufficiently the part in the overdevelopment played by the abundance 
and the accessibility of the American reserves of bituminous coal. 

It is further pointed out that competition has not brought to the miners the 








eae CCTM 


a 


aon! 


Pies 2 re AE, 











Sa 


th 
al 
by 
fu 
W 


hs 
th 











Aust eee 


= mw 


a EEN Seneenpremperen 





123] Reviews 241 

benefits attributed to it by its supporters. The industry is one of the most dan- 
gerous, living conditions in the mining communities are bad, few of the amenities 
of life have been supplied, and for all these shortcomings, which may be consid- 
ered perhaps as unavoidable conditions of coal mining, there has been for the 
miner no compensating higher level of wages. In fact, because of the irregular 
operation of the mines, wages have been lower than in other industries with much 
safer and more pleasant working conditions. For the consumer, competition has 
brought coal of uncertain quality with periods of surplus and periods of acute 
shortage and at prices that have advanced much more rapidly than the general 
price index. 

The remainder of the book is devoted to the consideration of the introduction 
of machinery and to its effects on the mining industry. Though an industrial 
revolution is already evident in coal mining, the authors do not believe that the 
technical improvements will be the long sought panacea. In fact, the machine 
will aggravate existing difficulties. Because of the heavier overhead, machine 
production may make for a more regular operation of the mines, but because of 
its greater capacity, it will mean the dismissal from the industry of a large number 
of miners, perhaps as many as one-half of those now employed. Because of the 
competition of the surplus labor, particularly the unorganized labor, before it can 
be shifted to some other industry, machine production holds the threat of lower 
wages, longer hours, and a relaxing of safety standards. Itmeans the loss of per- 
sonal freedom on the part of the miner, for with the coming of the machine he is 
no longer an independent workman but a cog in the factory system. 

In the present volume the authors do not attempt to present their remedies for 
the coal industry. They content themselves with analyzing its present condition 
and with painting a vivid picture of the hopeless state to which it has been brought 
by the system of free competition. The discussion of remedies is promised for a 
future volume. The full value of the contribution of Professor Hamilton and Miss 
Wright will depend upon that future volume. The analysisof theillsof theindus- 
try is the easy part of the task. It has been done many times before, though per- 
haps in not so vivid or so interesting afashion. Willthey beableto prescribe with 
the same skill that they have shown in diagnosis? 

Quite a different sort of book is What the Coal Commission Found, a symposium 
by members of the technical and economic staff of the late Coal Commission. 
Hereisnocomicoperatouch. Itisacold, rather colorless presentation of facts, but 
they are facts without which The Case of Bituminous Coal could not have been 
written. What the Coal Commission Found is really a product of the procrastina- 
tion of Congress. The term of office of the Coal Commission ended on September 
22,1923. Its final report was sent to Congress on December 10, 1923, and there it 
rested for many months. Finally, despairing of early Congressional action, some 
of the experts of the staff arranged with a private publisher for the printing of a 
summary of the findings of the Commission. 

For a few months the resulting volume provided the only readily accessible 
record of the work of the Commission. Thenecessary action was finally taken by 
Congress and in December, 1925, the complete report of the Coal Commission was 
printed. It destroyed some of the valueofthesummaryvolume. Though one of 































































American Statistical Association 





242 [124 


the announced purposes of What the Coal Commission Found was to make avail- 
able in simpler form for the general public the vast amount of information 
gathered by the Commission, the editors have not succeeded in that purpose. The 
style is not popular and the text is heavy withstatistics. Itisnot easy reading for 
the casual reader, and the student of the coal industry will undoubtedly prefer to 
consult the well indexed pages of the official report. Between those two extremes 
there is a large group, including business men, journalists, and students of 
general economics, who will have need of a reference work on the coal industry 
that is authentic and at the same time more compact than the five volumes of the 
official report. For that group What the Coal Commission Found will continue to 
be of the greatest value. 

In the grouping of subjects the editors have followed the general plan of the 
complete report. The bituminous and anthracite industries are considered sepa- 
rately. For each industry there are chapters discussing such problems as the 
mining and marketing of coal, costs, prices and profits, wages and earnings, liv- 
ing conditions of the miners, and labor relations. 

The volume is concluded with a presentation of the final recommendations of 
the Commission. They have already been discussed widely in the press and in 
economic journals. Probably the most significant is the recommendation that 
there should be established a permanent body for the continuous collection and 
dissemination of the facts of coal mining and for the regulation of the industry. 
Because of a reluctance to add to the already large number of government 
bureaus and “because of the intimate relation of coal and transportation,’’ the 
Coal Commission recommended that the fact-finding and regulatory agency 
should be a bureau of the Interstate Commerce Commission. Though there may 
be some disagreement regarding the wisdom of placing the control in the hands of 
that particular Commission, there can be no question of the necessity of creating 
a permanent fact-finding body. If the ills pointed out in The Case of Bituminous 
Coal are ever to be remedied, there must be a thorough understanding of the 
industry, both on the part of the immediate principals and of the public, based 
upon facts and not upon tradition and propaganda. The intelligent control of the 
industry cannot be left to spasmodic investigations of temporary commissions 
created in the midst of some emergency. 


Columbia University JoHNn E. OrncHARD 


Railroad Accounts and Statistics, by Charles E. Wermuth. New York: Prentice- 
Hall, Inc. 1924. xiii, 349 pp. 


The author of this volume properly emphasizes in his prefatory remarks the 
advancement in railway accounting principles and practices in the United States. 

Unfortunately, his volume does not correspond in excellence to the careful 
and intelligent work that has been done for many years in the development of 
railroad accounting; nor does it conform to the object the author lays down in 
his preface, namely, to supply a descriptive and analytical outline of railway 
accounts and statistics. It is an almost totally inadequate presentation. 


eats a, 


rr 


= 


ya a ee 


nas 2 > date i eee aNe Sek LONG 





& ee 











to 


ci 


R 


st: 


WI 
in 


fr 








ee SS Oo eH 


mr © 


— _ —“- 


| 
a 
x 
3 
4 
: 





naan ay 


Dua wedi bales ae 








125] Reviews 243 

The book, containing 331 text pages, is little more than a series of lists of items, 
serial groupings of accounts, and transcripts of blank forms and schedules, with 
a very thin connecting text. There is no introduction worthy of the name, 
there is no historical account of the development of accounting, there is no dis- 
cussion of either accounting or statistics from a logical, historical, or philosophical 
standpoint. 

Nearly two-thirds of the space, or 215 out of the 331 pages, comprises appen- 
dices that are nothing more than classifications quoted verbatim from the Inter- 
state Commerce Commission. These classifications can be secured at little or 
no cost from the Interstate Commerce Commission or from the Government 
Printing Office at Washington, and their inclusion in a volume of supposedly 
analytical character is hardly defensible. 

If ever a tail wagged a dog, here is an example: appendices, 215 pages; text, 
only 116 pages. Nor is this all. The 116 pages supposedly devoted to text 
are plentifully interlarded with lists of items, details of classifications, forms, 
charts, and the like. This section is divided into three parts, devoted to railroad 
accounts, to railroad statistics, and to the railroad report, respectively. The 
part dealing with statistics comprises only 20 pages, only a small portion of which 
is devoted to analysis, the great bulk being paraphrased from the Commission’s 
descriptive classifications. Only 14 pages are granted to a description of the 
“railroad report,’’ by which is meant the annual report filed by each railway 
company annually with the Interstate Commerce Commission. 

The field which this volume attempts to occupy is fully occupied by the annual 
and occasional publications of the Railway Accounting Officers’ Association, 
some of the more general issues of which can be secured at nominal cost by any 
interested person. 

Jutius H. PARMELEE 

Washington, D. C. 


The Formative Period of the Federal Reserve System (During the World Crisis), 
by W. P. G. Harding, A. M., LL. D., Former Governor of the Federal Reserve 
Board. Boston, and New York: Houghton, Mifflin Company. 1925. ix, 
320 pp. 

Governor Harding in this volume of about 255 pages of text undertakes 
to point out the essential features of the growth and development of the Federal 
Reserve System during its first decade. It so happened that that decade coin- 
cided with the World War and the readjustments that followed it. This makes 
Governor Harding’s book, like his own term of service as head of the Federal 
Reserve System, one of striking and noteworthy interest. 

Mr. Harding’s method of dealing with the subject he treats is to select out- 
standing issues or problems, envisage them in their broader aspect, eliminate 
very largely technical features, and then present his views in a semi-popular style 
with running commentaries. His plan makes the book comparatively easy read- 
ing, and there is no reason why the ordinary non-technical reader should not get 
from it a very fair notion of the chief questions with which the Reserve System 






































244 American Statisticai Association [126 


had to struggle during the early part of its existence. It may be noted, however, 
that having determined upon this method of treatment and this style of pres- 
entation, Mr. Harding would probably have been better advised had he elimi- 
nated most of the various quotations and even the tabular matter which appear 
here and there in the course of the book. He either does not give enough of 
this kind of ‘‘documentation” or else he gives rather too much of it. If his 
volume is intended for purely popular consumption the amount of this supporting 
matter is too large, while if his book is intended as a full or technical exposition of 
policy it is too little. 

But this after all is a technical defect and one readily to be forgiven by those 
who value authentic statements covering important events or policies, and 
originating with the men who were instrumental in directing them. Governor 
Harding gives his view moderately and conservatively in most cases with refer- 
ence to the fundamental questions with which he came in contact as Governor of 
the Federal Reserve Board. Much of what he says had already become known 
through his numerous speeches, official utterances printed in the Federal Reserve 
Bulletin, and in other ways, and vet there is no little value in having his ideas 
assembled, segregated from other and irrelevant matter, and stamped with his 
own personality. 

In a general way the volume is the mental history of one who began work as a 
member of the Federal Reserve Board without, perhaps, very great confidence in 
the System he was sent to administer, and without immediate and direct prepara- 
tion, but who nevertheless applied himself steadily and consistently to the per- 
formance of duty and who gradually came to realize the signficance of the work 
he was doing and the importance of the contributions he might make to the 
welfare of his country at a time when soundness of finance was absolutely essen- 
tial. It shows him taking a faithful, conservative attitude on most of the issues 
with which he had to deal, and carrying his own position logically through in spite 
of bitter criticism in Congress and at times misunderstanding in other quarters. 

Governor Harding’s book necessarily leaves the reader in doubt as to his actual 
view on a good many points. He is more concerned with the explanation and 
settlement of particular issues that he had to face than he is with the general 
conditions of which they must be regarded as mere illustrations. This naturally 
prevents him from generalizing as he otherwise might. He is rather careful to 
avoid forecast or prediction and to adhere closely to an historical point of view. 
His underlying thought as to principles or problems is largely to be inferred from 
the position he took upon various actual issues rather than to be found in the 
form of direct statements. One exception to this general condition is to be 
noted, however, in his chapter on ‘‘ Rediscount Functions” where he develops 4 
self-consistent theory of discount rates, and there are other instances that might 
be cited. They are, however, exceptions to the general tenor of his treatment. 
It should be noted in closing that one of the most interesting features of Mr. 
Harding’s review of reserve experience is found in a number of incidents which he 
narrates as throwing light upon personalities and points of view. To those who 
are already familiar with reserve experience some of these are very enlightening 
although to the general reader their significance may be less obvious. 





127] Reviews 245 


Mr. Harding’s volume deserves a place of convenient access in every collection 
of books having to do with American banking and banking experience. It will 
be referred to often in connection with future discussion and will have an increas- 
ing value as time goes on. 

H. Parker WILLIs 

Columbia University 


Elementary Statistical Methods, by William G. Sutcliffe. New York: McGraw- 
Hill Book Company, Inc. 1925. xvii, 338 pp. 

Interest in the quantitative approach towards the study of sociological and 
economic problems is steadily increasing and finds an echo in the publication of 
books on statistical methods. 

The first five chapters of Mr. Sutcliffe’s book deal in succession with the fol- 
lowing subjects: ‘‘Statistics as a Science”’; ‘The Need for Statistics” ; ‘‘ Problems 
Prior to Collection of Data”; ‘‘The Collection of Primary Data”; and finally, 
“Classification and Tabulation.” 

On page 4 we find reference to Captain John Graunt. The publication of the 
first edition of Natural and Political Observations Mentioned in a Following Indez, 
and Made upon the Bills of Mortality occurred in 1662.1 On the same page refer- 
ence is made to J. B. Sussmilch. It should be Johann Peter Siissmilch, and on 
page 6 it should be ‘‘Geschichte,”’ not ‘‘Geschicte.” 

On page 5 the author states that biology has contributed much to statistical 
science both as to terminology and method. We will certainly agree with Mr. 
Sutcliffe, for in glancing at current texts we find that the economist has done 
most of his borrowing from biology as regards method. 

So strongly has the economist leaned on the biologist for his sources that the 
biometrical thread in the treatment of data goes through most text-book fabrics. 
I refer to the type of example offered in demonstrating the method. It seems 
that in teaching the subject it should be impressed upon the student that the tool 
he is about to learn to handle has wide possibilities of application to problems of 
the greatest variety which do not fall directly into the field of classic economics. 
To be more explicit, one finds, for instance, comparatively few engineers who are 
aware that their intrinsic mathematical equipment, which is often very much 
broader than that of the average economist, enables them to approach shop prob- 
lems from the statistical angle. But how is the student to know that this tool 
can be used advantageously in different fields of investigation, unless such typi- 
cal examples are brought to his attention? 

That there is a real need for such a point of view, can be shown by a recent pub- 
lication, which was offered by Dr. Ing Karl Daeves in Diisseldorf,Germany.2 In 
his Grosszahlforschung, in which Dr. Daeves gives himself credit for having found 
“ein neues Arbeitsverfahren,” a statement which is, of course, far-fetched 
especially since Dr. Daeves himself has made not a single new contribution, 

1 Raymond Pearl, Introduction to Medical Biometry and Statistics, W.B. Saunders Company, 1923, p. 30. 

? Dr. Ing Karl Daeves, Diisseldorf, 1924. Verlag Stahleisen m. b. H. Grosszahlforschung. Grund- 


lagen und Anwendungen eines neuen Arbeitsverfahrens fir die Industrieforschung mit zahlreichen 
praktischen Beispielen. 











246 American Statistical Association [128 





we find collected, as far as I was able to discover, for the first time examples 
demonstrating the application of the statistical method to pureindustrial problems 
as they occur in the factory, in the steel mill, in the chemical plant, and other places 
of shop activity. I refer particularly to the very broad use which has been made 
of the frequency curve as applied to subjects like the Brinell test, to gas analyses, 
to the calibration of planimeters, to elongation tests, and to numerous other typi- 
cal shop problems which are intensely interesting because they show the great 
range of application of the statistical science. The occasional demonstration of 
such examples would give the student still greater respect for the tool and its wide 
possibilities of application without leading him astray from the field of applied 
economics. After all, the majority of graduate economists must look sooner or 
later to the industrial field for the application of their knowledge. 

I make this lengthy statement because the laboratory exercises in present text- 
books on statistical methods have been confined to a rather narrow field, which 
might give the student the impression that the method is reserved exclusively for 
biologists and economists. It would be refreshing to find a new setting in a sta- 
tistical stage scenery, a setting which, if I may apply the term, reflects a little 
more our industrial age, without jeopardizing the pedagogical value of the book. 

The chapter on graphic presentation deals with the subject according to the 
adopted standards.: Chart No. 30 on page 83 is a poor example of graphic pres- 
entation, and should have been redrawn. The organization of legends, the letter- 
ing and the whole arrangement of the chart, is not an example which contributes 
to the formation of good graphical habits. 

Part II the author has entitled “‘Statistical Methods for Frequency Series,” 
which contains very detailed information as to classification and tabulation, and 
finally graphic presentation, by which the student should benefit greatly. On 
page 123, the author refers to the normal curve of error. Evidently tie curve is 
a free-hand drawing, and a more representative picture of the normal curve of 
error would not have been out of place. On page 144, we read the statement that 
the mean cannot be determined graphically. This is not correct. It would not 
be practical to determine the mean graphically, but it can be done by a compara- 
tively simple method. Not only can the mean be determined graphically, but 
also the weighted average, by means of the Vector Method, as I have shown in 
the December 1921 issue of this JouRNAL.? 

In mentioning the normal curve of error, another question is brought up. Is 
it judicious to omit a chapter on elementary probability in an introduction to el- 
ementary statistical methods, especially in a book which covers so much ground? 
Definitions in the chapter on distributions and later in the chapter on correlation 
can be formed more clearly, and will emerge more sharply silhouetted when refer- 
ence can be made to the probability concept. I am of the opinion that frequency 
distribution and correlation should be preceded by a chapter on the probable error 

1 On pages 75 and 76 the author makes reference to a diagram which appeared with a second chart in 
the Harvard Business Review, October 1923. Both charts were prepared by the writer as explained 
in the footnote in the Harvard publication. It is evident that the chart has been redrawn for Mr. Sut- 
cliffe and an error occured in one of the legends. The correct name is ‘‘ Devisenbeschaffungsstelle.”’ 


2 R. von Hubn, “A Vector Method for Computing a Weighted Average,” this Journat, December 
1921, p. 1000. 








if 


ISD EASE NT Ty 


E 
4 


ee ae 


ee a 2 e 





A sdiatbieiieinsdt adalat kotasoe bs eae oe eet 


ind koa 


ey 


¥ 
: 
; 











ehnerwnean 


129] Reviews 247 
concept and an elementary discussion of the theory of probability, and this with- 
out making the standard of the book post-graduate. 

When the author comes, on pages 215-218, to the explanation of the probable 
error concept, which is largely done by quotations from other authors, and says 
in his own words: ‘‘ The probable error, as already indicated, gives the limits with- 
in which the coefficient of correlation may be said to fluctuate owing to the possi- 
bility of defective sampling,” and when he further says: ‘Coefficient of correla- 
tion for table LVI equals 0.67+0.02, which indicates that the measure of 
association may fluctuate between 0.69 or 0.65,”’ there is a looseness in the defini- 
tion which might have been avoided if reference could have been made to a more 
detailed exposition of the probable error concept. 

I can do no better than to quote Dr. Raymond Pear! on his very precise defini- 
tion of the probable error concept:! ‘ A conventional measure of the reliability 
of results, or put the other way about, of their “scatter” due to the chance effects of 
sampling, is used by statisticians and called the “probableerror.” Itisa constant 
so chosen that when its value is added to and subtracted from the result obtained, 
or the numeric conclusion reached, it is exactly an even chance that the true result 
or conclusion lies either inside or outside the limits set by the probable error in 
the plus and minus direction. For example, if it is stated that the mean age at 
death of persons dying in Baltimore is 39.83 + 2.60 years, it means that the mathe- 
matical probability that the true average falls between 37.23 years (39.83 — 2.60) 
and 42.43 years (39.83+2.60) is exactly equal to the mathematical probability 
that the true age falls outside those limits.”” The preciseness of Pearl’s definition 
does not leave any possible doubt in the mind of the student as to what Mr. 
Sutcliffe means by “to fluctuate owing to the possibility of defective sampling.” 

On page 211 the author shows the derivation of the correlation coefficient by 
graphic methods, which is of value to the student and makes the demonstration 
much clearer. Non-linear correlation is not treated, and I think mentioning of 
the index of correlation would not have taken the book into the realm of higher 
statistical methods. Correlation on the basis of linear regression restricts the 
field of investigation considerably, especially when it is desirable to find certain 
maximum conditions which are not subject to the laws of linear equations. 

Part III deals with ‘Statistical Methods for Time Series.”” Mr. Sutcliffe has 
put his best into this particular part of the book, and has covered a great deal of 
territory, theory as well as practice, and has shown in this section particularly, 
what is so very desirable, practical working methods in great detail, and I think 
Part III will be very beneficial to students who have as their guiding text-books 
others than Mr. Sutcliffe’s Introduction to Elementary Statistical Methods, and 
who wish to do some collateral reading on this particular subject. 

R. von HunN 


The Employment of Young Persons in the United States. National Industrial Con- 
ference Board, Inc. New York. 1925. 150 pp. 
This report of the National Industrial Conference Board attempts, in 150 
pages, nearly half of which are taken up by tables, charts, or maps, to discuss the 
1 Raymond Pearl, Introduction to Medical Biometry and Statistics, W. B. Saunders Co., 1923, p. 213. 































248 American Statistical Association [130 


significant data relating to employment of children and young persons up to 18 
yearsofage. Theonly contribution based upon original investigation by the Con- 
ference Board is the statement that, in the opinion of the officials of two-thirds of 
the states replying to a questionnaire, the enforcement of the two Federal laws, 
before they were declared unconstitutional, helped the states materially in en- 
forcing their own laws—in part by improving state certificating methods, and in 
part by increasing the fear of prosecution because of the possibility of bringing 
cases in the Federal courts. Aside from this the report is made up of general 
statements of the problem as the Board sees it and of summaries, accompanied 
by appraisals, of existing data. 

The principal criticism of the report is that it fails to emphasize the importance 
of age groups, a failure which leads to confusion in the presentation of the prob- 
lem and to blurring of the issues and conclusions. The problems presented by 
the 8-year-old berry picker, by the still physically and mentally immature child 
of 14, and by the young apprentice of 17 are conspicuously different. The crux 
of the whole problem is the ages at which children should be permitted to go to 
work, the ages at which they should be permitted to work in particular occupations, 
and the ages at which they should be permitted to work under particular condi- 
tions. Yet, though age groups to which particular statistical data apply are usu- 
ally stated, and stated correctly, their importance seems to be ignored in many 
statements, both of the problem and of conclusions. We find many sweeping 
statements which seem completely to ignore the fact, mentioned elsewhere, that 
children not only enter industry untrained but remain so because the occu- 
pations open to them offer no training, and to forget the many other evils connected 
with commercializing the lives of little children. For example, the Foreword 
prophesies that “our young people will continue to participate in the productive 
life of the nation, not because such participation is immediately advantageous 
to them or to others, but because it is a necessary preparation for their future 
réle as citizens and members of the working community.”’ And in the summary 
we find such loose propositions as: ‘“ Participation in the economic life of their 
environment before adult age is desirable and necessary for complete education 
and maturity of development of the young people of the nation, as well as for the 
promotion of good citizenship and of the social and economic welfare.” 

At the same time that the report thus spreads over the entire problem of the 
gainful labor of young children certain conclusions applicable, at best, only to 
the oldest age group under consideration, it also spreads over this older age group 
conclusions true only of the younger. For example, it lays great emphasis upon 
the large proportion of children in agricultural pursuits and especially upon the 
fact that 87 per cent of the child workers under 14 (10-13) years of age reported 
to the census in 1920 were engaged in agriculture. Yet of the entire group of 
young persons 10 to 17 years of age, inclusive, at work only 40.6 per cent were in 
agriculture, over a million and a half of them being engaged in manufacturing, 
mechanical and other non-agricultural occupations. 

That child labor “is not a problem to be solved by legislation alone, whether 
state or Federal,” is almost axiomatic, and no one is likely to deny that it is in 
part ‘‘a problem of supplying more and better educational facilities,” and in part 








: 
bi 
: 





SRS etree 


3 
1g 
3 

4 
ME 














oO 


_ eh Fe ee fh 


eH So cee Pa, 


REN oe epdaads 


cake Rae ale 


a i 


© SNe ai Sh tae CE cba 





eee anenlar ter mee 





131] Reviews 249 


“a problem of stabilizing and improving adult earnings and family living stand- 
ards.” But it is to be regretted that this report seems to twist the need for 
better schools and better wages, on the one hand, into support for the outworn 
social theory that a hundred years ago placed upon the backs of little children 
the burden of their own support, and, on the other, into arguments against the 
legislation which during these hundred years has so largely taken children out of 
industry and given them at least a taste of the schooling a civilized society owes 
its young people. This twisting is effected in part by placing in the forefront of 
the discussion an entire chapter on the reasons children give for going to work, 
with special emphasis on poverty. Even if the statements on this point of chil- 
dren who are seeking their “working papers” were of any value as proving eco- 
nomic need for their labor, which no one could believe who has had any experience 
in passing out these coveted documents, they do not prove that in the interest of 
society children should go to work. 

The final point made in the report, that more detailed and up-to-date informa- 
tion is greatly needed, is undoubtedly true. A careful analysis of the problem 
itself, followed by an examination of the relationship of available information to 
each of its component parts, would have revealed that the evidence as to the 
social undesirability of the labor at least of the younger children is more convine- 
ing than it appears when the problem is analyzed only vertically and not horizon- 
tally. Nevertheless, a new survey, though one of a different type from that sug- 
gested by the Industrial Conference Board, is needed. The census of population 
must be taken far too rapidly and by agents with much too little training to in- 
clude questions relating to “the character of children’s occupations, the amount 
of time spent in them, the factors influencing their pursuit, and the extent to 
which they interfere with school attendance,”’ as the Conference Board suggests 
itshould do. Still less practicable would it be to include in the population sched- 
ule questions relating to the exact character of the processes performed by chil- 
dren in industry, the effects of various occupations on their health, and other 
questions concerning which the Conference Board itself points out the need for 
information. Only a thorough, comprehensive study of the problem in all its 
aspects and as it affects all age groups can light the way out of the morass of mis- 
leading statements and inadequately supported conclusions in which public opin- 
ion and public responsibility appear to have become engulfed within recent years. 

HELEN SUMNER WoopDBURY 


Modern Immigration: A View of the Situation in Immigrant Receiving Countries, 
by Annie Marion MacLean. Philadelphia: J. P. Lippincott Co. 1925. 
393 pp. 

In the judgment of the reviewer this book is a valuable contribution to the 
literature of immigration. It deals, not with the conditions motivating emigra- 
tion, but with the immigration policies of the United States and six other chief 
immigrant receiving countries of the world—Canada, Australia, New Zealand, and 
South Africa of the British Empire, and Argentina and Brazil as the great immi- 
grant receiving countries of South America. 














American Statistical Association [132 





250 


Following a brief survey of the general aspects of immigration, there are given, 
for each of these seven countries, a statement of the opportunities which it offers 
to the immigrant, the volume and character of immigration thereto, and the dis- 
tinctive features of its immigration policy as evidenced by legislation and other 
activities. 

The author notes certain similarities in the immigration policies of these sev- 
eral countries. All have enacted legislation designed to exclude the defective and 
criminal classes; and all tend to discourage the immigration of races or types con- 
sidered unassimilable. Particularly since the Great War, confidence in the effi- 
cacy of the “melting pot” has given way to the recognition of “‘a Gresham’s Law 
for peoples”; and the dominant note in immigration policy is self-protection 
rather than altruistic welcoming of the unfortunates of other lands. Several 
of the immigrant receiving countries give approved immigrants assistance in 
financing their passage. 

Significant differences also appear. Inthe United States the factory has attract- 
ed the bulk of immigration in recent decades. The other six countries are more 
sparsely settled and agricultural settlers are most desired. In Australasia we 
find a population of British descent, with a fear of Asiatic submergence which is 
crystallized in its slogan of a ‘ White Australia”’; and likewise in the other immi- 
grant receiving countries of the British Empire we find a strong preference for 
white settlers and particularly for those of British descent. South Africa, witha 
large colored population, welcomes British settlers with capital, but does not en- 
courage white laborers to immigrate. In Australasia, on the other hand, the 
desire is for white agricultural workers, but there is a sentiment against the immi- 
gration of skilled laborers. Canada has endeavored through legislation and ex- 
tensive publicity to make effective a policy of selective immigration. The Eng- 
lish-speaking countries desire that immigrants shall be literate. In Argentina 
and Brazil the percentage of illiteracy is very high. 

In Argentina a large majority of the population is of Spanish stock, blended 
with native races; in Brazil, the white third of the population is mainly of Portu- 
guese descent. These countries offer “an ideal new-world home for Latin people”; 
and there has been a heavy immigration from some of the Latin countries, nota- 
bly Italy, which now finds an inadequate outlet in the United States. Owing to 
differences in language, customs, and agricultural methods, the English-speaking 
immigrant in South America, particularly if without capital, is seriously handi- 
capped. ; 

An excellent chapter on ‘Aims and Problems” summarizes the principles 
which dominate present-day immigration regulations and practices; and a closing 
chapter on “‘The Future” clearly states the issues yet to be met. 

A lengthy appendix of 150 pages contains verbatim the immigration and natu- 
ralization laws in force in the United States. 

On the whole, the discussion, while chiefly descriptive rather than searchingly 
analytical, is well balanced, with a touch of rational emotionalism that stresses, 
not immediate individual hardships, but the bearing of immigration policies on 
racial and national progress. Thecontents are well arranged forselective reading. 
Harry JEROME 





\- aes 


SF ase 


hi 





BORE SEB ee ey 











5 POONA ng ae 9S 


didi 1a. 2 ca? Wes ancien eee 


3 eT ee Pe ee 
= ere eee SO aa 4 





Recent Literature 


RECENT LITERATURE 


STATISTICAL BOOKS 


Benner, C. L. The Federal Intermediate Credit System. New York: Macmillan. 
1926. xviii, 375 pp. 

BLANCHARD, RatpH H. Workman’s Compensation in the United States. Inter- 
national Labour Office Studies and Reports Series M (Social Insurance), No. 5. 
Geneva. 1926. 103 pp. 

Dusuin, L. I. (Ed.). Population Problems in the United States and Canada. Boston: 
Houghton Mifflin. 1926. xi, 318 pp. 

E1GELBERNER, J. The Investigation of Business Problems. Chicago: A. W. Shaw. 
1926. x, 335 pp. 

Fisher, A. G. B. Some Problems of Wages and their Regulation in Great Britain since 
1918. London: King. 1926. xvii, 281 pp. 

INTERNATIONAL LaBour OrrFice. Wage Changes in Various Countries 1914 to 1925. 
Studies and Reports Series D (Wages and Hours), No. 16. Geneva. 1926. 
143 pp. 

KALLBRUNNER, HERMANN. Der Wiederaufbau der Landwirtschaft Osterreichs. 
Wien: Verlag von Julius Springer. 1926. v, 150 pp. 

League or Nations, HeattTH ORGANISATION. International Health Year-Book, 
1924. Reports on the Public Health Progress of Twenty-two Countries. 
Geneva. 1925. 501 pp. 

Levy, Hermann. Der Weltmarkt 1913 und heute. Leipzig and Berlin: Verlag und 
Druck von B. G. Teubner. 1926. iv, 116 pp. 

New York (StaTe) Transit Commission. Annual Report, 1924. Albany. 1926. 
612 pp. 

RIGGLEMAN, J. R. Graphic Methods for Presenting Business Statistics. New York: 
McGraw-Hill. 1926. xiii, 231 pp. 

Segre, Mario. Le Banche Nell ’Ultimc Decennio. Milano: La Stampa Commer- 
ciale. 1926. iii, 82 pp. and charts. 


GOVERNMENT PUBLICATIONS 


Unrrep States. 
—— Bureau of the Census. 
“Census of Prisoners: 1923 (Preliminary Report).” 
“Cotton Production and Distribution, Season of 1924-25.” 
——- Department of Agriculture. 
“The Relation Between the Ability to Pay and the Standard of Living among 
Farmers.” 
Farmers’ Bulletin No. 1485. ‘Rural Hospitals.” 
—— Department of Labor. 
Bureau of Labor Statistics: 
No. 402. ‘Collective Bargaining by Actors.” 
Children’s Bureau: 
No. 147. “References on Child Labor and Minors in Industry.” 
No. 149. “Vocational Guidance and Junior Placement.” 
No. 150. ‘‘Children Indentured by the Wisconsin State Public School.” 

































American Statistical Association [134 


No. 151. ‘Child Labor in Fruit and Hop-Growing Districts of the North- 
ern Pacific Coast.” 
No. 152. “Industrial Accidents to Employed Minors in Wisconsin, 
Massachusetts, and New Jersey.” 
No. 155. ‘Child Labor in Representative Tobacco-Growing Areas.” 
Treasury Department. 
Public Health Reports: 
Vol. 41. 
No. 3. “Stream Pollution Investigation of the Public Health Service.” 
No. 4. “A Study of Sickness among 133,000 Industrial Employees.” 
No. 6. ‘Experiments Using Brewers’ Yeast to Supplement a Deficient 
Diet”; ‘The Rate of Deoxyg<nation of Polluted Waters.” 
No. 10. ‘Current World Prevalence of Liisease”’; “ Division of Venereal 
Diseases, July 1-December 31, 1925.” 
Department of the Interior. 
Bureau of Education: 
No. 19, 1925. “Statistics of Land-Grant Colleges, 1923.” 
ForEIGN CouNTRIES. 
Gold Coast. 
“A Review of the Events of 1925-26 and Prospects of 1926-27.” 205 pp. 
India. 
“Review of the Trade of India in 1924-25.”” vii, 99 pp. 
Italy. 
“Censimento della Popolazione del Regno d’Italia al Dicembre 1921. II. 
Venezia Tridentina.” xxi, 270 pp. 
New Zealand. 
“General Report. Results of a Census of New Zealand Taken April 1921.” 
xi, 232 pp. 
ARTICLES IN PERIODICALS 


MonrTauy Lasor REviIEw. 

January, 1926. “Conditions in the glass manufacturing industry”; “The 

bituminous-coal situation”; “Are average wage rates keeping pace with the 

cost of living’’; “Industrial pensions for old age and disability’’; “Labor 

passages in the President’s message”; “International statistics of per capita 
output of coal”; “The Negro—A selected bibliography.” 

February, 1926. ‘Changes in cost of living in the United States’’; ‘‘ Hours and 
earnings in bituminous-coal mining”; ‘Wage rates for common labor”; 
“Unemployment in foreign countries”; “Casualties of trainmen on Class I 
railroads”’; “Awards of compensation for temporary total and permanent par- 
tial disabilities”; “Strikes and lockouts in the United States”; ‘“ Restriction 
of immigration—Bibliography.” 

March, 1926. “Progress in accident prevention’; “‘The Department of Labor 
library”; ‘‘Trade-union movement of Germany and its problems”; “Pur- 
chasing power (wholesale prices) of the dollar’; ‘Women’s industrial confer- 
ence’’; “Results of studies on tetraethyl lead gasoline”; “Agreement in the 
anthracite industry.” 

April, 1926. ‘Are average wage rates keeping pace with the increased cost of 
living (second article)’’; ‘The longshoreman and accident compensation”; 

“Physical examinations in industry”; “Railroad labor accomplishment”’; 
“ Accident rates in underground work in metal mines.” 

















135] Recent Literature 253 
Tue JOURNAL OF PoxiTicaL Economy. 

February, 1926. Amy Hewes: “The Task of the English Coal Commission”’; 
S. F. Rigg: “The Chicago Teamsters’ Union”; H. R. Trumbower: “ Railroad 
Abandonments and Additions’; H. A. Wooster: “‘Manufacturer and Artisan, 
1790-1840”; Leona M. Powell: “Typothetae Experiments with Cost Work’’; 
S. H. Slichter: ‘The Worker in Modern Economic Society.” 

April, 1926. Wiesner and Ficek: ‘Education for Business in Czechoslovakia”; 
J. Van Sickle: ‘‘The Fallacy of a Capital Levy”; G. W. Terborgh: “The Pur- 
chasing-Power Parity Theory”; J. L. Laughlin: ‘The Industrial Outlook’’; 
R. C. McCrea: “Economics in a Business Curriculum”; Herbert Feis: ‘The 
Queensland Basic Wage.” 

PoLITICAL ScrENCE QUARTERLY. 

March, 1926. ‘Record of Political Events from January 1, 1925, to December 

31, 1925.” 
Tue ANNALS. 

March, 1926. “Legal Aid Work.” 
THE JOURNAL OF ACCOUNTANCY. 

February, 1926. James L. Dohr: “Valuation of Intangible Property before 
Board of Tax Appeals”; A. J. Carson: “Silver Exchange Accounting in Ori- 
ental Trade”; G. W. Wilde: ‘‘Computation of Earned Discounts.” 

March, 1926. G. E. Frazer: “ Valuations for Tax Purposes”; C. B. Couchman: 
“Passing Examinations”; C. R. Whitworth: “Audit of an Automobile Manu- 
facturing Company”; J. R. Wildman: “ Interpretation of Financial Statements.” 

April, 1926. P. F. Brundage: “Treatment of No-Par-Value Stock in New York, 
New Jersey, and Massachusetts”; H. S. Ford: “Auditing in Educational 
Institutions”; Frederick Vierling: ‘‘ Accumulation of Discounts.” 

JOURNAL OF THE INSTITUTE OF BANKERS. 

February, 1926. B. L. K. Henderson: “Our Mother Tongue, Lecture II”; 
R. W. Jones: “Land Certificates as Cover for Advances.” 

March, 1926. B. L. K. Henderson: “Our Mother Tongue, Lecture III”’; 
B. Campion: “Banking in Relation to Limited Companies, Gilbart Lecture 
I”; H. F. Martin: ‘The Human Factor in Banking.” 

April, 1926. B.L. K. Henderson: “Our Mother Tongue, Lecture IV”; B. Cam- 
pion: “‘ Banking in Relation to Limited Companies, Gilbart Lecture I1’’; F. E. 
Steele: ‘The Human Factor in Banking.” 

JOURNAL OF THE Roya SratTisTicaL Society. 

January, 1926. G. U. Yule: ‘Why do we sometimes get nonsense correlations 
between time series”; Sir Henry Rew: ‘‘The International Statistical Institute 
and its Sixteenth Session”; A. N. Shimmin: “ Distribution of Employment in 
the Wool Textile Industry of the West Riding of Yorkshire.” 

INTERNATIONAL LaBour REVIEW. 

January, 1926. E. Adler: “Inventions of Employees and the Austrian Patents 
Act of 1925”; M. Martna: “Social Aspects of Land Reform in Esthonia’’; 
“Vocational Guidance in the United States of America’’; “A Proposal for 
National Insurance in Australia.” 

February, 1926. E. Michel: “‘The Frankfort Academy of Labour and the 
Problem of Workers’ Education”; E. Milhaud: ‘‘The Results of the Adoption 
of the Eight-Hour Day”; “The International Trade Union Movement: Prob- 
lems of Organisation’’; “Industrial Diseases: Analysis of Factory Inspection 
Reports, 1920-1922.” 








































254 American Statistical Association [136 


March, 1926. W. A. Riddell: “The Influence of Machinery on Agricultural 
Conditions in North America’’; “‘The New Wage Act in South Africa”’; “ Pre. 
Apprenticeship and Vocational Guidance in France”’; “The New British Pen- 
sions Act’’; “Industrial Diseases: Analysis of Factory Inspection Reports 
1920-1922, II.” 

JOURNAL OF THE INSTITUTE OF ACTUARIES. 

March, 1926. D. Houseman: “Changes made by the new Law of Property 
Acts, with special reference to Life Assurance Practice”’; “‘ Experience of Ameri- 
can Companies in regard to Blood Pressure’; Robert A. Bateman: “Legal 
Notes”’; J. H. Gunlake: “A simple group method of obtaining approximately 
the special reserve required in the case of Terminable-Premium Policies for 
expenses and profits after cessation of premiums.” 

ZEITSCHRIFT FiiR VOLKERPSYCHOLOGIE UND SOZIOLOGIE. 

March, 1926. R. Thurnwald: “Fiihrerschaft und Siebung”’; A. W. Nieuwen- 
huis: “ Der primitive Mensch und seine Umwelt”; Georg Karo: “ Der geistige 
Krieg gegen Deutschland.” 

JOURNAL DE LA SocréTé DE STATISTIQUE DE Paris. 

January, 1926. M. J. Bourdon: “La Statistique des familles norvégiennes au 
recensement de 1920”; M. Gaston Cadoux: “‘Nos pertes de guerre; leurs 
réparations et nos dettes de guerre’; M. Marcel Barincou: “Chroniques des 
transports.”’ 

February, 1926. Marcel Moine: “Le développement de |l’organisation anti- 
tuberculeuse en France’’; M. A. Barriol: ‘‘ Bibliographie: Paris Guide 1925— 
Le Guide de la vie 4 Paris.” 

March, 1926. M. W. Methorst: “Méthodes 4 suivre pour la préparation des 
statistiques des stocks’; M. Yves-Guyot: “ Prévisions relatives aux paiements 
en nature des réparations et des dettes interalliés’’; M. Ricard: ‘Chronique 
des Banques et questions monétaires.”’ 

Gr1orRNALE DecGui Economist. 

December, 1925. VV. Porri: “‘Impressioni di Equilibrio Instabile Nel Movimento 
delle Societa per Azioni Italiane”; G. Zingali: “Il Salario della Donna Rispetto 
a Quello dell’Uomo”’; F. Vinci: ‘Il Metodo Statistico.” 

January, 1926. C. Bresciani-Turroni: “‘La Crisi della (Stabilizzazione Mone- 
taria)”’; G. Majorana: ‘‘Le Teorie della Moneta e del Valore in Aristotele”’; 
Paolo Albertario: ‘Ernesto Marenghi.” 

February, 1926. U. Ricci: “‘L’Offerta del Risparmio”; G. Zingali: “G. B. 
Salvioni.”’ 

March, 1926. U. Ricci: “L’Offerta del Risparmio”; R. Vicentini: “Sulle 
Variazioni dei Salari dal 1914 al 1924 in alcune Industrie di Milano’’; Gustavo 
del Vecchio: “Il Costo Quale Elemento della Teoria Economica.” 

GIORNALE DI MATEMATICA FINANZIARIA. 

December, 1925. F. Insolera: “La Cassa Nazionale Infortuni e |’ Assicurazione 
Infortuni Agricoli”; E. Del Vecchio: “Il calecolo dei momenti delle funzioni del 
rischio nell’ ipotesi del Makeham”’; P. Luckey: ‘‘ Abbaco per il calcolo delle 
annualita.”’ 

METRON. 

I-IX, 1925. E. C. Rhodes: “On Sampling”; W. R. Dunstan: “Height and 
Weight of School Children in an English Rural Area’’; M. Boldrini: “ Dubbi 
intorno ad alcune leggi demografiche”; Novosselski and Paevski: “ Life Tables 
of the City of Leningrad”; M. Greenwood: “The Growth of Population in 





Recent Literature 255 


England and Wales”; G. Zingali: ‘La popolazione della Sicilia preelenica”’; 
P. P. L. Fegiz: “I cognomi di 8. Gimignano”’; M. Halbevachs: “ La population 
et les tracés de voies 4 Paris depuis cent ans.” 


MISCELLANEOUS REPRINTS AND PAMPHLETS 

Poor RELIEF IN PENNSYLVANIA, by Emil Frankel. Commonwealth of Pennsylvania, 
Department of Welfare, Bulletin 21. 

REDUCTION IN PRICES THROUGH THE ELIMINATION OF INDUSTRIAL Waste, by Herbert 
Hoover. An extract from the Thirteenth Annual Report of the Secretary of 
Commerce. 

AUSTRALIAN SociALisM IN Practice, by C. H. Northcott. Reprinted from Scientia, 
December, 1925. 

BurLDING THE Human Macuineg, The Cost of Food. Statistical Bulletin, Vol. VII, 
No. 2. Metropolitan Life Insurance Company. 

CaNncEeR IN New ZEALAND, by J. W. Butcher. Extract from the New Zealand Official 
Year-Book, 1926. 

OrricraL VrTaL STaTisTics OF THE Repusiic oF Austria. Statistical Handbooks 
Series: No. 5. League of Nations, Health Organisation. 

SraTISTIQUE D&MOGRAPHIQUES ET SANITAIRES DE LA BeLGique. Extrait du tome 
XXII du Bulletin de la Commission centrale de statistique. 

Dv TAUX DECROISSANT DE LA MASCULINITE DANS LES NAISSANCES LEGITIMES, by 
C. Jacquart. XVI Session de |’Institut International de Statistique, Rome, 


1925. 





